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Humanity’s forgotten family 


Hominin fossils discovered near the site of the ‘hobbit’ Homo floresiensis provide yet more 
evidence that the human lineage is more diverse than was ever imagined. 


every person now living stand 30 ghosts, for that is by how 

many the dead outnumber the living. That was in 1968 — the 
number reckoned today would probably be greater. The human 
lineage diverged from that of chimpanzees some 5 million to 7 mil- 
lion years ago. Were we able to mark the remains of all our ancestors 
from that point, the world would be one enormous cemetery. 

The most likely fate of any living organism is dissociation into its 
component molecules, if not reabsorption as food into something else. 
That makes the chance ineffably remote that the remains of any one 
individual will be fossilized in any recognizable form, and, this hav- 
ing been achieved, be recognized as such by a passing palaeontologist 
before the fossil, too, crumbles to dust. 

It is possible that many human species once existed, but became 
extinct with such finality that even those few that were fossilized have 
since disappeared, leaving absolutely no trace that generations of a 
distinct species lived and died on this planet — a kind of double extinc- 
tion, without hope of memorial or discovery. Fossils from the human 
lineage are scarce, and, given the numbers that must once have lived, 
the percentage recovered must hardly be significantly different from 
zero. (You can read about some of those that have been found in our 
Nature collection at go.nature.com/1zjssjs.) 


A rthur C. Clarke wrote in 2001: A Space Odyssey that behind 


LONG-LOST RELATIONS 

Hence the surprise when, in 2004, a group of scientists in Indonesia and 
Australia announced the discovery of what became known as Homo 
floresiensis, a species of unusual, dwarfed hominin — that is, a creature 
living or extinct more closely related to us than to chimps — whose 
remains were found in Liang Bua cave on the island of Flores in Indo- 
nesia (P. Brown et al. Nature 431, 1055-1061; 2004). 

There was some doubt at the time that H. floresiensis represented a 
real species rather than a variant of modern humans affected by some 
disease or pathological condition, but this dissent was gradually eroded, 
not only by a long palaeontological record at Liang Bua, but also by a 
rich archaeological record in the island's Soa Basin, some distance to the 
east, showing that hominins of some sort had lived in the region for up 
to one million years (A. Brumm et al. Nature 464, 748-752; 2010.) Yet 
direct evidence, in the form of bones and teeth, was elusive. Until now. 

On page 245, researchers report a fragment of mandible and six iso- 
lated teeth of hominins from Mata Menge in the Soa Basin that they 
describe as similar to those of H. floresiensis, but more primitive in some 
respects and — if anything — even smaller. In an accompanying paper 
on page 249, they show that the remains were deposited 700,000 years 
ago, many thousands of years before those from Liang Bua. 

The researchers take the appropriately cautious and parsimonious 
view that these hominins were most closely related to early Asian Homo 
erectus, on the grounds that this is the only species of hominin other- 
wise known to have inhabited that part of the world at that time (see 


page 164). However, it remains possible, as an accompanying News & 
Views on page 188 explains, that these creatures might represent some 
very early, pre-H. erectus exodus from Africa. If so, that expands our 
ignorance from a barely manageable ocean into a gulf of interstellar 
magnitude, implying that a wholly unknown plethora of hominins 
lived in Eurasia millions of years earlier than anyone suspected, just one 
of whose number has been found in the region’s southeastern extremity 

to betray the possibility that such an array of 


“Researchers species ever existed. 

are less eager Is this unwarranted speculation? Perhaps 
than they once not: the discovery of H. floresiensis prompted 
were to string a sea change in palaeoanthropologists’ atti- 
fossils together tudes to the unknown. Researchers are less 
into confident eager than they once were to string fossils 
chains of together into confident chains of ancestry 
ancestry and and descent. They are more likely to reap- 
descent.” praise the various oddities of human evolu- 


tion, no longer dismissing them as fossils 
that are hard to fit into the current paradigm of ancestry and descent, 
but seeing them as representatives of entirely unsuspected branches 
of the human family tree. One thinks of the many hominin remains 
recovered over the past few decades from China, some of which do 
not quite fit into any current species. Or of Homo heidelbergensis, an 
increasingly unwieldy catch-all for hominins from the Middle Pleis- 
tocene epoch (781,000-126,000 years ago); or of H. erectus itself, a 
grouping of such variety that some have found it hard to accept that 
all the fossils ascribed to it comprise a single species. And there are 
others less familiar, such as skulls from Iwo Eleru in Nigeria that look 
surprisingly archaic, despite being assigned a relatively recent date of 
as few as 11,700 years ago (K. Harvati et al. PLoS ONE 6, 24024; 2011). 

Studies on human DNA, ancient and modern, have reinforced this 
trend. The recovery of an entire genome of a hitherto unknown archaic 
hominin from a single finger bone from Denisova Cave in Siberia 
was — and is — an astonishing achievement, both for the discovery itself 
and for its implications (D. Reich et al. Nature 468, 1053-1060; 2010). It 
reinforces hints that the scarcity of human fossils belies what might once 
have been hitherto unimaginable diversity. The finding, reported in the 
same paper, that Denisovan DNA lives on in people from southeast Asia 
and the western Pacific, just as Neanderthal DNA survives in Eurasians 
generally, proves that fossils tell us much less than we would like of the 
human career. And as with Iwo Eleru, so with DNA: there are signs that 
the genomes of some modern Africans contain elements derived from 
archaic hominins not found in the fossil record (M. F. Hammer et al. 
Proc. Natl Acad. Sci. USA 108, 15123-15128; 2011). 

These early human relatives left signs of their passing as evanescent 
and enigmatic as the Cheshire Cat from Alice’ Adventures in Wonder- 
land — slowly fading from view, with just its smile hanging on, until 
that, too, disappears. m 
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Energy hit 


Germany’s decision to slow the expansion of 
green-energy production is a reasonable move. 


last month, Germany paid people to use electricity. The coun- 

try’s investment in renewable energy has paid off so hand- 
somely that green power sources produced almost enough electricity 
to meet national demand. Because the rest of the energy infra- 
structure — nuclear, coal and gas stations — was online too, the 
temporary surplus pushed prices on Germany’s spot market on 8 May 
into the negative. For a brief time, the more electricity that commercial 
customers used, the more money they made. 

Just as the introduction of negative interest rates — banks charging 
customers to deposit money — have prompted more questions about 
the state of the world’s economy, so the events of 8 May have been used 
to throw stones at Germany’s energy transition, the Energiewende. 

Critics of the Energiewende blame the generous support for renewa- 
bles — including a guaranteed above-market price for producers, and 
grid priority for wind and solar power — for such grotesque mar- 
ket distortions and for Germany’s relatively high household electri- 
city bills. Indeed, private consumers in the country pay more than 
€20 billion (US$23 billion) in annual surcharges for the fixed feed- 
in tariffs that go to individual producers. In response to criticism, 
the government last week agreed to slow the scale and pace at which 
Germany will further expand renewable energy over the next decade. 

Electricity that is produced from renewable sources (including 
hydropower) has tripled in Germany over the past decade, and now 
provides (on most days) almost one-third of domestic electricity gen- 
eration. The 19% growth in renewable-power generation last year was 
the largest in at least a decade. In terms of renewable-power capacity, 
Germany is third in the world, behind China and the United States 
(see Trendwatch, page 157). But at 1.1 kilowatts per capita, the 92 giga- 
watts that the nation produced in 2015 represents more than twice the 
renewable power per capita of any other large economy. 

The planned amendment of Germany’s renewable-energy act, agreed 


f or a few hours around lunchtime on a bright and windy day 


in principle last week, sets a 45% cap on the amount of renewable 
electricity generation by 2025. And, as demanded by Brussels, future 
promotion of wind and solar energy will be linked to tenders, favouring 
producers who generate renewable power at the lowest price. Unsur- 
prisingly, the plan has upset parts of the renewable-energy sector — in 
particular, smaller companies, and millions of homeowners who have 

invested in lucrative rooftop solar panels — 


“What that fear the loss of market opportunities and 
Germany’s revenue. Meanwhile, green lobbyists say the 
case illustrates reforms are a concession to the fossil-fuel sec- 
is that energy tor that could send a false signal to countries 
transition must that regard Germany asa role model. Overall, 
be backed by it might seem that the Energiewende — and, 
comprehensive by extension, renewables investment else- 
plans.” where — has failed, or is falling from political 


grace in troubled times. 

That is wrong. Opinions will rightly differ on which fiscal and policy 
measures are best suited to promoting renewables. But controversies 
of microeconomics will not derail the grand decarbonization project 
that is under way. 

What Germany’s case does illustrate is that this transition must 
be backed by comprehensive energy plans. To incorporate fast- 
growing, decentralized power generation into electricity grids requires 
improved networks, reliable tools to predict supply and demand, 
efficient storage and more-flexible conventional plants. 

Science can help. In April, the German federal government approved 
a 10-year, €400-million programme for research on the technol- 
ogy — smart grid technology and energy storage, for instance — that 
will be needed for an energy system anda market dominated by renew- 
able sources. Funders should ensure that projects address the problems 
faced by suppliers and users of electricity under real-world market 
conditions. Sceptics may celebrate the wrinkles as renewables bed into 
the energy market, but the long-term trend has been set. 

Of course, Germany and other countries will continue to depend 
on coal and gas to meet electricity demands for at least a couple of 
decades — but then, massive subsidies ($500 billion globally in 2014, 
against $135 billion for renewables) mean that fossil-fuel power con- 
tinues to be offered at a knock-down price. A realistic, climate-friendly 
energy system for the future demands greater reform of the oil and coal 
markets, rather than changes to the renewables sector. m 


Second chances 


The line between compliance and misconduct 
is finer than you might think. 


although the one that tends to dominate public discussion is 
research misconduct. 

Similarly, there are many different definitions of research miscon- 
duct, but the one that tends to draw attention is deliberate decep- 
tion and data fraud. That can help to explain why, when Nature ran 
a news story in January 2013 about a new course that would attempt 
to rehabilitate misconduct offenders, many of the online comments 
below the story were negative (see Nature 493, 147; 2013). 

One response was typical: “If a scientist wilfully pollutes the 
scientific record with damaging, self-serving, purposeful lies, it is too 
late. If they have received funding for that fraud, I am still mystified 
as to why the granting agency does not seek to recoup their costs in 
court. Now, someone suggests we further train them on something 
that is obvious to any scientist with half a brain on their shoulders?” 

More than three years on, the architects of the rehab course offer 


r | Ahere are many reasons why a research paper could be retracted, 
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a progress report, which appears as a Comment article on page 173. 
More than three dozen researchers have been through its doors, and, 
according to the authors, most leave as better scientists than when 
they arrived. 

Attendees do not include the high-profile data fraudsters whose 
offences are so serious that they are fired. By definition, researchers 
on the course are scientists who have been caught out but whom 
institutions want to keep. 

Most of them had seen their research privileges suspended — 
with offences ranging from plagiarism and poor oversight to falling 
foul of the rules and regulations on animal welfare and informed 
consent. Despite the ‘research misconduct label, instances of con- 
scious wrong-doing were rare. As one participant said: “Prior to this 
situation, I tried to follow the spirit of the law. Now I try to follow the 
letter of the law” 

Two points stand out. First, the typical character and personality of 
these scientists, and their knowledge and attitudes, were no different 
from those of you and your colleagues. Misconduct, the authors say, 
can be down to circumstance: “we believe that most researchers may 
be susceptible”. And second, those circumstances are becoming more 
common. 

The most common cause of an offence was a lack of attention, 
prompted, among other things, by being too busy and trying to juggle 
too many projects. Sound like anyone you know? = 
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MIT MEDIA LAB 
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engineered mutations quickly through populations — means 

that a single released organism could eventually alter most of 
its local population, and quite possibly all populations of the species 
throughout the world. Any accidental release, even if there was no 
ecological damage, would surely damage public trust and prompt 
harsh restrictions on research. 

The US National Academy of Sciences released guidelines this 
week for the responsible conduct of gene-drive research. The report 
comes almost two years after the first published description of how 
the CRISPR-Cas9 genome-editing technology could enable gene 
drives in many different organisms. That’s a fast turnaround for the 
academy, but an eternity for the field: in that time, scientists have 
demonstrated CRISPR-based gene-drive systems in four species. 

The report makes some sensible sugges- 
tions, such as phased testing and ecological-risk 
assessments, but if we're going to develop proper 
safeguards for gene drives or other powerful 
technologies, we need to fix a greater problem: 
the closed-door nature of science. 

No one would rationally design the current 
scientific enterprise. It is wasteful and inefficient. 
Researchers repeatedly run into the same prob- 
lems and unknowingly duplicate efforts. It stunts 
collaboration: we never learn who has the other 
piece of a puzzle unless we run into them at a con- 
ference. It wastes time on endless grant-writing. 
It’s terrible for researcher well-being: competitive 
pressure ruins playful discovery and creation. 

And it’s unsafe. Regulation will always be too 
slow. Science is too vast for researchers to reliably 
foresee the consequences of their work. The problem was neatly sum- 
marized by atom-bomb pioneer Robert Oppenheimer: “When you see 
something that is technically sweet, you go ahead and do it, and you 
argue about what to do about it only after you have had your technical 
success.” 

Some technical successes are not to be pursued. But others are 
desperately needed. How can we hope to tell the difference when 
science is done behind closed doors? 

There are signs of progress. My colleagues and I publicly discussed 
the probable consequences of a CRISPR-based gene drive before doing 
any experiments. And many gene-drive researchers have already worked 
together to improve safety and call for transparency. But this has been 
done on an informal basis. For example, my group saw a gene-drive 
paper by another laboratory and was able to suggest changes — the need 
for extra safeguards to prevent an accidental release — but only because 
we received an in-press copy of the publication from a journalist. 

Sadly, open and responsive science flies in the face of current incen- 
tives. Scientists who disclose their ideas are often ‘rewarded’ by being 


Te emergence of gene-drive systems — which spread 


AND RESPONSIVE 


SCIENCE 


FLIES IN THE 


FACE 
OF CURRENT 
INCENTIVES. 


Gene editing can drive 
science to openness 


The fast-moving field of gene-drive research provides an opportunity to 
rewrite the rules of the science, says Kevin Esvelt. 


scooped by another lab, rather than by being recognized for their 
creativity. It is a prisoner’s dilemma. The benefits come from coopera- 
tion by everyone. But by participating you risk being exploited by people 
who steal your idea, get it working before you do, and claim the credit. 

Gene-drive research offers a way out. The field is new and small, 
and many of us have already worked together to publish a joint rec- 
ommendation calling for future experiments to use multiple stringent 
confinement strategies. Several groups already disclose proposed and 
ongoing gene-drive research and invite feedback, and active discussions 
between researchers and funders seek ways to ensure that everyone will 
be similarly forthcoming. 

My group and others will soon launch the Responsive Science Project 
to enable gene-drive scientists to share their plans and research with one 
another and with interested communities. We hope that it will become 
a central repository of ideas and information rel- 
evant to gene-drive research that will permit open 
assessment and critique before experiments begin. 

Journals could help by offering incentives 
to persuade scientists to share their proposals. 
When a paper is published by authors who didnt 
play by the new rules (to share what they’re doing 
and collaborate with the people who first shared 
the key ideas), journals could check the reposi- 
tory to identify scientists who deserve a share of 
the credit and invite them to write an accompa- 
nying piece. Similarly, all funders should require 
immediate public disclosure of proposals involv- 
ing gene drives, as well as regular public updates 
on the status of funded research. 

If this attempt at open science works for one 
field, it could expand to encompass research on 
other shared-impact technologies and to fields beyond. That alone is 
reason enough to try the approach. But gene-drive technology is also 
unique in that its very nature demands a new approach. 

Because the consequences of mistakes involving gene-drive organ- 
isms could affect communities outside the laboratory, scientists have 
an obligation to openly share their plans, invite suggestions and con- 
cerns, disclose experimental results as soon as possible, and redesign 
the technology as needed. Applied to gene drives, such an approach 
will also have a greater chance of earning popular support for appli- 
cations that could save millions of human lives and rescue numerous 
species from extinction. 

We should ensure that gene-drive research is open and responsive — 
then drive those changes through the scientific ecosystem. m 


Kevin Esvelt is leader of the Sculpting Evolution group at the MIT 
Media Lab, Massachusetts Institute of Technology, Cambridge, 
Massachusetts. 

e-mail: esvelt@mit.edu 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


MOLECULAR BIOLOGY 


CRISPR tweaked 
to edit RNA 


The CRISPR-Cas9 gene- 
editing system snips DNA, but 
a newly characterized version 
targets RNA instead. 

The CRISPR-Cas system 
is used by many bacteria 
to combat viruses. Feng 
Zhang of the Broad Institute 
of MIT and Harvard in 
Cambridge, Massachusetts, 
Eugene Koonin of the US 
National Institutes of Health 
in Bethesda, Maryland, 
and their colleagues mined 
this natural diversity for 
alternatives to the DNA- 
cutting Cas9 enzyme. They 
found that an enzyme called 
C2c2 from the bacterium 
Leptotrichia shahii can be 
programmed to cut specific, 
single-stranded RNA targets 
in another bacterium, 
Escherichia coli. 

With further tweaks, the 
system could be used to 
attach fluorescent tags to 
RNA, direct RNA to specific 
compartments in the cell or 
otherwise chemically modify 
RNAs to study their function. 
Science http://dx.doi.org/ 
10.1126/science.aaf5573 (2016) 


Magma pool under 
New Zealand 


Molten rock is accumulating 
in amagma chamber beneath 
New Zealand, raising questions 
about volcanic hazards. 

Ian Hamling and his 
colleagues at GNS Science in 
Lower Hutt, New Zealand, 
used satellite radar data to 
study ground motions in the 
Taupo Volcanic Zone, an 
area of high volcanic activity. 
They found one region, 
adjacent to this area, where 
the ground rose by around 
5 millimetres per year from 


GLACIOLOGY 


Early signs of ice retreat 


Two studies show that Antarctica has been losing ice for 


longer than previously thought. 


A team led by Shujie Wang at the University of Cincinnati 
in Ohio studied recently declassified images taken by US 
spy satellites. They found that glaciers feeding the Antarctic 
Peninsula’s Larsen B Ice Shelf (pictured in 2000) were already 
accelerating towards the sea between 1963-79 and 1979-86, 
long before the shelf’s spectacular collapse in 2002. 

In separate work, Frazer Christie at the University of 
Edinburgh, UK, and his colleagues used scientific satellites to 
confirm that ice has been retreating along the coast of West 


Antarctica since at least 1975. 


Geophys. Res. Lett. http://doi.org/bjm3; http://doi.org/bjm4 (2016) 


the 1950s onwards. That rate 
more than doubled to about 
12 millimetres a year in the 
mid-2000s, and has since 
dropped back to the lower 
rate. Calculations suggest that 
about 9 million cubic metres of 
magma pushed its way into the 
crust each year during peak 
growth, about 10 kilometres 
below the surface. 

It’s not clear whether the 
magma chamber will increase 
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the risk of volcanic eruptions. 
Sci. Adv. 2,e1600288 (2016) 
For more on this research, 

see go.nature.com/28ew7kh 


| AGEING 
Chemical extends 
worm lifespan 


A chemical lengthens the 
nematode worms lifespan by 
interfering with its perception 
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of whether food is present. 

Model organisms are known 
to live longer when they are 
fed a restricted diet. Mark 
Lucanic and Gordon Lithgow 
at the Buck Institute for 
Research on Aging in Novato, 
California, and their colleagues 
screened 30,000 synthetic 
compounds and found several 
that extended the lifespan of 
the nematode Caenorhabditis 
elegans. The most potent, NP1, 
mimicked the effects of dietary 
restriction by masking the 
activity of a sensory pathway 
that normally signals that food 
is abundant. The chemical 
does this by boosting signalling 
of a specific neurotransmitter 
called glutamate to the 
pharynx, which in nematodes 
is a tube-like organ that pumps 
food into the gut. 

Further investigation of 
nutrient-sensing pathways 
could identify other life- 
extending chemicals, the 
authors say. 

Aging Cell http://doi.org/bjhh 
(2016) 


Relativity passes 
black-hole test 


General relativity holds true, 
even under the extreme 
conditions of colliding black 
holes. 

In 2015, the Advanced Laser 
Interferometer Gravitational- 
Wave Observatory (LIGO) 
saw the first evidence of 
gravitational waves, which had 
been created by two merging 
black holes. Walter Del Pozzo at 
the University of Birmingham, 
UK, and his colleagues on the 
LIGO collaboration and its 
European partner, the Virgo 
collaboration, compared the 
signal with those predicted by 
simulations based on general 
relativity. The teams found that 
the observations matched the 
predictions to a high degree, as 
they had in previous tests 
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under much weaker 
gravitational fields. This was 
the first direct test of Einstein's 
theory of general relativity 
under such extreme space- 
time warping and fast-moving 
conditions. 

With planned boosts to the 
LIGO detectors’ sensitivity, 
future observations could be 
used to test other theories 
of gravity and hypothesized 
alternatives to black holes, say 
the authors. 

Phys. Rev. Lett. 116, 221101 
(2016) 


| MARINE SCIENCE 
Plastic pollution 
hurts perch 


Tiny fragments of plastic in 
the ocean could change fish 
behaviour and decrease their 
survival. 

Research indicates that the 
world’s oceans are polluted 
with many thousands of 
tonnes of ‘microplastic 
debris — particles measuring 
less than 5 millimetres in 
diameter. Oona Lénnstedt 
and Peter Ekl6v at Uppsala 
University in Sweden 
exposed European perch 
(Perca fluviatilis) to levels of 
microplastic similar to those 
found in the environment. 
Although 96% of fertilized 
eggs not exposed to plastic 
hatched, only 81% of those 
placed in water with high 
levels did so. Moreover, 

46% of fish larvae that 

had been raised in a tank 
containing plastic-free water 
were still alive after 24 hours 
in a tank with a predatory 
pike, whereas 100% of those 
raised with high levels of 
plastic were eaten within 

16 hours. 

Larvae reared with high 
concentrations of plastic 
did not show anti-predator 
responses such as freezing or 
reduced movement 
when exposed 
to alarm 
signals 
from other 
animals. 
Science 352, 
1213-1216 
(2016) 


Ancient dog DNA 
shows dual origins 


The first complete genome 
sequence of an ancient dog 
suggests that dogs were 
independently domesticated 
twice, in two different regions. 
Researchers have debated 
whether domestic dogs 
originated in Asia or Europe 
about 15,000 to 12,500 years 
ago. Laurent Frantz of the 
University of Oxford, UK, 
and his team sequenced 
the mitochondrial DNA 
of 59 ancient dogs and 
the complete genome ofa 
4,800-year-old dog from 
Ireland. They also analysed 
DNA from hundreds of 
modern dogs and wolves, 
and found that populations 
of Western European and 
East Asian canines diverged 
several millennia after the first 
appearance of the animals. 
Owing to a lack of 
archaeological evidence 
of ancient dogs between 
these regions, the authors 
propose that the animals were 
domesticated separately in 
Western Europe and East Asia 
from distinct wolf populations. 
Science 352, 1228-1231 (2016) 
For more on this research, 
see go.nature.com/1pzqqvr 


Age robs monkeys 
of vocal control 


Monkeys lose the ability to 
consciously control their 
calls as they age, which may 
have limited the evolution 
of language in non-human 
primates. 

Steffen Hage and his 
colleagues at the University 
of Tiibingen in Germany 
studied the vocalizations 
of two male captive rhesus 

macaques (Macaca mulatta; 
pictured) over a roughly 
five-year period. The 
monkeys were trained 
to produce a specific 
call in response 
to a coloured 
cue to receive a 
reward. At five 


RESEARCH HIGHLIGHTS MuiiSaiiaa¢ 


years old, the macaques scored 
highly, but by eight years of age 
neither monkey could perform 
the task. Adult macaques 
still produced spontaneous, 
instinctive calls in their 
enclosure, indicating that they 
maintained vocal ability. 
Language may have evolved 
in humans by first extending 
the vocal flexibility of juveniles 
into adulthood. 
J. Exp. Biol. 219, 1744-1749 
(2016) 


BOTANY 


How desert moss 
drinks from air 


Researchers have revealed 
minuscule features on the 
leaves of acommon desert 
plant that allow it to collect 
water from moist air. 

Syntrichia caninervis 
(pictured) is a small moss that 
lacks roots. To understand 
how it uses its leaves to capture 
moisture, Tadd Truscott of 
Utah State University in Logan 
and his co-workers altered the 
relative humidity in their lab 
and used high-speed cameras 
and electron microscopy to 
study the plant’s response. 
They found that a hair-like 
structure called an awn at the 
tip of each leaf has grooves 
and barbs that collect and 
transport water. Nanogrooves 
about 200 nanometres wide 
allow water from humid air to 
nucleate on the awn’s surface, 
and larger microgrooves 
collect bigger water droplets 
from fog. 

Small barbs along the cone- 
shaped awn provide places for 


droplets to collect before being 
transported down the awn to 
the leaf. 

Nature Plants http://dx.doi.org/ 
10.1038/nplants.2016.76 (2016) 


INFECTION 


Bacterium could 
curb malaria 


West African mosquitoes 
infected with the bacterium 
Wolbachia are less likely than 
uninfected ones to carry the 
malaria parasite Plasmodium. 
Wolbachia infection has 
long been proposed as a way to 
reduce the spread of mosquito- 
borne diseases such as malaria. 
To study natural Wolbachia 
infection, Flaminia Catteruccia 
at the Harvard T. H. Chan 
School of Public Health in 
Boston, Massachusetts, and 
her colleagues collected and 
studied 221 Anopheles coluzzii 
mosquitos from a village in 
Burkina Faso. They found that 
about half of the insects carried 
a Wolbachia strain. Only one 
infected mosquito (less than 
1%) was also infected with 
Plasmodium, whereas roughly 
10% of the 105 mosquitos free 
of Wolbachia tested positive for 
the malaria parasite. 
Mathematical modelling 
suggested that even at this 
rate of Wolbachia infection, 
the bacterium could decrease 
the prevalence of malaria in 
humans. 
Nature Commun. 7, 11772 (2016) 
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SEVEN DAYS sescnss 


POLICY 


Forensics guidelines 
On 3 June, for the first time, the 


US Justice Department issued 
recommendations meant to 


help to guide expert testimony 


and lab reports in forensic 
science. If adopted, they 
would apply to all agencies 
that issue forensic reports and 


would cover seven disciplines, 


including body-fluid testing, 
fingerprints and toxicology. 
The department will issue 
further guidelines later in the 
summer regarding genetic 


testing and several other forms 


of forensic evidence. 


Policy labours lost 
An independent inquiry 
published on 2 June found 
that most UK government 
departments don't keep 
track of the policy-linked 
research that they have 

paid for in areas such as 
health, education or climate 
change. Only 4 out of 24 
government departments 
contacted through freedom- 
of-information requests 
keep a centralized database 


Extreme weather rocks Europe 


Severe storms and heavy rain caused havoc 
last week in parts of Western and Central 
Europe. On 1 June, a flash flood triggered by 
extreme downpours devastated the town of 
Simbach am Inn in southern Germany, killing 
7 people and causing damage across the 
region in excess of €1 billion (US$1.1 billion). 
In Paris on 3 June, the River Seine’s rising 
floodwaters led authorities to close the Louvre 


and Orsay museums and move artworks 

to safety. And on 4 June, Germany’s largest 
rock festival, near Mendig, was aborted 

after lightning struck at least 80 visitors. 
Because warmer air holds more moisture, 
rising temperatures are increasing the odds 
of intense precipitation. Worldwide, record- 
breaking rainfall events have been on the rise 
over the past 30 years. 


SEEKING GREAT MENTORS! 
For more than ten years, 
Nature has been recognizing 
outstanding scientific 
mentors around the world. 
This year’s awards for great 
mentors are focused on the 
US states of Washington, 
Oregon and California. 
Each prize is an award of 
US$10,000. For details 

of the competition and 
guidance about nominating 
candidates for lifetime 

and mid-career awards, 

see www.nature.com/ 
nature/mentoringawards/ 
uswestcoast. Nominations 
must be submitted by 
mentees of the nominees, 
and the closing date is 

8 August 2016. 


of commissioned research, 
according to Sense About 
Science, the London-based 
science-advocacy group that 
commissioned the inquiry. 
The report also points to 
several cases in which the 
publication of sensitive 
findings — including on drug 
use and immigration — was 
delayed owing to political 
concerns. See page 164 
go.nature.com/1sqhudo for 
more. 


Novartis Shanghai 


The pharmaceutical giant 
Novartis opened an extensive 
campus in Shanghai, China, 
on 2 June. The campus 

hosts the third major site of 
the Novartis Institutes for 
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Biomedical Research (NIBR), 
where around 300 scientists 
will use epigenetic approaches 
to develop treatments for 
cancers and other diseases 
that are prevalent in China 
and Asia. The other two 

main NIBR campuses are in 
Cambridge, Massachusetts, 
where research includes 
cancer, cardiovascular disease 
and diabetes; and in Basel, 
Switzerland, which focuses on 
autoimmunity, transplantation 
and inflammation. 


Solar power in Cuba 


A British solar company, Hive 
Energy, has secured a contract 
to build the first utility-scale 
solar power plant in Cuba. 

On 3 June, the company, 
based near Southampton, 
announced plans to complete 
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a 50-megawatt solar-power 
project by 2018 in a free- 
trade zone in the Cuban port 
city of Mariel. The project is 
one of many that the Cuban 
government committed to 
pursuing under the 2015 Paris 
climate agreement. 


Stem-cell setback 


The biopharmaceutical 
company StemCells of 
Newark, California, has 
prematurely terminated 

a phase II clinical trial 
examining whether cells 
derived from fetal brains 
could help to fix damaged 
spinal cords and increase 
muscle strength in paralysed 
limbs. Interim analysis of 
the 17 people treated so 
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far showed no significant 
improvement, even though 
experiments in mice with 
spinal damage had been 
positive. The company is 
planning to close down its 
operations. Other trials of 
stem-cell preparations to 
treat spinal-cord injury are 
continuing. 


Shared data 


The US National Cancer 
Institute (NCI) unveiled a 
platform for sharing genomic 
and clinical data on 6 June 

at the annual meeting of the 
American Society of Clinical 
Oncology in Chicago, Illinois. 
The platform, called the 
Genomic Data Commons, 
will be a key component of 
US vice-president Joe Biden's 
National Cancer Moonshot 
and President Barack Obama's 
Precision Medicine Initiative. 


Tribute to Ali 


Millions of admirers, 
including US President 
Barack Obama, are mourning 
the death of Muhammad 

Ali (pictured) — legendary 
boxer, political activist and, 
in his later years, advocate for 
Parkinson’s disease research. 
The three-time heavyweight 
champion died on 3 June, 
aged 74, having lived for 
more than 3 decades with the 
degenerative brain disease. 

In 1997, Ali co-founded the 


TREND WATCH 


Global power-generating capacity 
from renewable sources excluding 
hydropower had climbed to 

785 gigawatts by the end of 2015, 
after a record 120 gigawatts were 
added during the year. China, 
which hosts more than 25% of 

the world’s non-hydro renewable 
capacity, is the top country in 
absolute terms, followed by the 
United States and Germany. 
China and the United States 

both lag far behind Germany in 
per capita capacity. Including 
hydropower, renewables have a 
24% share in world energy. 


Muhammad Ali Parkinson 
Center, a research and 
treatment facility in Phoenix, 
Arizona. 


Kavli Prize 


On 2 June, nine scientists 
were awarded the 2016 

Kavli Prize, a biennial award 
that honours individuals 

for seminal advances in 
astrophysics, nanoscience 
and neuroscience. This 

year’s laureates come from 
Germany, Switzerland, Britain 
and the United States. The 
co-founders of the Laser 
Interferometer Gravitational- 
wave Observatory (LIGO), 
Ronald Drever, Kip Thorne 
and Rainer Weiss, won for 
the detection of gravitational 
waves. Gerd Binnig, Christoph 
Gerber and Calvin Quate 
were recognized for their 
development of atomic-force 
microscopy. Eve Marder, 
Michael Merzenich and Carla 
Shatz won for the discovery 
of key brain mechanisms. 


RENEWABLE BOOM 


Recipients from each category 
shared a US$1-million cash 
prize. 


Courage to think 


Egyptian students and scholars 


wrongfully detained in their 
home country are being 
honoured on 9 June with the 
‘Courage to Think Defender 
Award’ by academic-freedom 


advocates Scholars at Risk. The 


group cites an “overwhelming 
crackdown” in recent years 
on Egypt's higher-education 
community, including 
“reported use of violence, 
wrongful prosecutions and 
imprisonment, professional 
retaliation and travel 
restrictions against scholars 
and students across the 
country”. According to data 
obtained by the Egyptian 
human-rights group 
Association for Free Thought 
and Expression, more than 
2,000 university students and 
professors in Egypt have been 
detained by security forces 
since July 2013. 


EVENTS 


Campus murder 


On 1 June, an armed 
38-year-old man entered the 
engineering department at 
the University of California, 
Los Angeles (UCLA), and 
fatally shot William Klug, a 
mechanical and aerospace 


engineering professor. Police 
think that the killer, who 
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received his doctorate with 
Klug’s guidance in 2013, then 
shot himself. The authorities 
have not confirmed a motive 
for the murder-suicide, but 
stated that the gunman had 
accused Klug of stealing his 
computer code and giving it 
to someone else. Investigators 
found a ‘kill list’ in the 
gunman’s Minnesota home 
that included Klug, another 
UCLA professor (who was 
unharmed) and a woman 
whom police subsequently 
found dead in a Minnesota 
suburb. 


Pure gravity 

The European Space Agency's 
LISA Pathfinder team 
announced on 7 June that its 
mission to test technology for a 
space-based gravitational-wave 
detector has been a success. 
Results from the first two 
months of operation showed 
that two metal cubes housed 

in the spacecraft were under 
free-fall through space, affected 
almost solely by gravity, 

with perturbations kept five 
times lower than required. A 
full-scale observatory, which 
will use similar cubes in 
spacecraft placed millions of 
kilometres apart to measure 
passing gravitational waves, is 
scheduled for launch in 2034. 


Brazilian anger 


Brazil's science ministry will 
not be reinstated, despite 
researchers’ protests, the 
country’s government says. 
The ministry was merged 

with the telecommunications 
ministry by interim President 
Michel Temer, who took office 
in May after the impeachment 
of Dilma Rousseff (see Nature 
533, 301; 2016). In the past 
week, scientists — already 
angry at massive budget 

cuts — have taken to the streets 
in Natal, Rio de Janeiro and 
Sao Paulo to protest against 

the merger. But Gilberto 
Kassab, who heads the science- 
telecomms ‘superministry, has 
told Nature that the merger will 
be maintained. 
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Gliders are among the instruments used at the seven monitoring sites of the US Ocean Observatories Initiative. 


OCEANOGRAPHY 


US ocean- observing 
project launches at last 


Data from a network of deep-water observatories are now streaming in real time. 


BY ALEXANDRA WITZE 


early ten years, US$386 million 
Ne many grey hairs after it got the 
go-ahead, an enormous US ocean- 
observing network is finally up and running. 
On 6 June, the National Science Founda- 
tion (NSF) announced that most data are now 
flowing in real time from the Ocean Observa- 
tories Initiative (OOJ), a collection of seven 
instrumented arrays. Oceanographers have 
the chance to test whether the technologically 
complex and scientifically unprecedented 
project will ultimately be worth it. 


“Tt has been stressful,” says Richard Murray, 
the NSF’s director for ocean sciences. “It’s not 
for the faint-hearted.” 

The raw-data streams came online in April 
— months behind schedule, in part because ofa 
2014 switch between university subcontractors. 

Through an open-records request, Nature 
obtained more than 1,200 pages of e-mails 
between project managers at the NSF and the 
Consortium for Ocean Leadership in Wash- 
ington DC, which built the observatory. The 
records reveal an extraordinary level of tension 
throughout 2014 and early 2015, as the final 
instruments were installed and the contract for 
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handling the data streams was switched from 
the University of California, San Diego, to Rut- 
gers University in New Brunswick, New Jersey. 

“Please excuse my display of stress in this 
email, but the InBox is overflowing with high- 
priority, short-fuse items — none of which 
deserve to be ignored — but all of which can- 
not be completed within the requested time 
frames,’ Timothy Cowles, then programme 
director at the Consortium for Ocean Leader- 
ship, wrote to the NSF in January 2014. 

The NSF cited cost overruns and per- 
formance delays as justification for changing 
the cyberinfrastructure contract later that > 
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> year. In April 2015, an under- 
water volcano laden with OOI 
instruments erupted, just as sci- 
entists had predicted — but the 
live data were not yet flowing to 
the wider scientific community. 
Now, about 85% of OOI data 
are available in real time on the 
project’s website, with the per- 
centage growing every week, 
says Greg Ulses, the current 
programme director at the Con- 
sortium for Ocean Leadership. 
The information — on factors 
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The Ocean Observatories Initiative 


others, such as an international 
programme to measure heat flow 
in this key region. “These are 
high-scientific-value sites that we 
have dreamed about, and now we 
have occupied them,” says Robert 
Weller, a physical oceanographer 
at the Woods Hole Oceano- 
graphic Institution. 

But the OOI’s future remains 
murky. A 2015 review of US 
ocean-science priorities sug- 
gested that the programme's 
operational budget should 


such as temperature and salin- 
ity — streams from more than 
900 sensors at the 7 sites. 

The OOI consists of one high- 
tech cable on the tectonically 
active sea floor of the northeast Pacific Ocean, 
together with two lines of oceanographic 
instruments — one off the US east coast and 
the other off the west coast — and four high- 
latitude sites, near Greenland, Alaska, Argentina 
and Chile. Each array involves a combination 
of instruments, from basic salinity sensors to 
sophisticated underwater gliders. 

The NSF built the network as a community 
resource, hoping to stimulate an era of virtual 
oceanography in which scientists explore real- 
time data sets open to all (see ‘Virtual view). 

“We know the data are valuable,” says Lisa 
Campbell, a biological oceanographer at Texas 
A&M University in College Station. “How to 
implement it is what we're working on” 

Those involved in the OOT’s painful birth 
are happy to see it working at last. “When 
I finally got through and saw the real-time 
data, I shouted so loud someone had to come 
down the hall and close the door,” says Glen 


has built a global network of sites to 
provide a continuous stream of data 
— capturing events such as the April 
2015 eruption of the Axial volcano. 


Southern 
Oceane & 


Gawarkiewicz, a physical oceanographer at 
the Woods Hole Oceanographic Institution in 
Massachusetts. 

The array off the coast of Massachusetts has 
already captured some unprecedented obser- 
vations, he says. In 2014, it measured air-sea 
fluxes when a hurricane passed overhead. The 
following winter, it measured dramatic shifts in 
the boundary at which shallow waters interact 
with deep ones. “That has tremendous practi- 
cal implications, because there’s a lot of com- 
mercial fishing in that area.” Gawarkiewicz 
says. Using OOT data, he is now working with 
local fishers to share real-time information on 
changes in temperature and currents. 

The west-coast array has studied a warm 
blob of water linked to weather patterns that 
are strengthening the ongoing drought in 
California. And in the North Atlantic, off 
the coast of Greenland, OOI scientists have 
coordinated their measurements with those of 


eArgentine 


be slashed by 20%, to around 
$44 million a year. Yet each of the 
arrays must be serviced every 
year or two to replace broken 
instruments and install new ones. 
The NSF has not yet decided how it will save 
that 20%. 

Later this year, the agency will solicit bids to 
manage the OOI for the next five to ten years. 
Who responds, and with what suggestions, will 
help to determine what gets cut. “We built this 
thing, and will be funding operations for what 
the community feels is best,” says Murray. 

Ultimately, there is no metric for what 
constitutes a successful OOI. Ulses says that 
the project needs to run for a full year before 
managers can assess which scientists are using 
which data, and how stable and successful the 
data streams are. 

Weller would like to see a set of OOI meas- 
urements become as iconic as the records of 
atmospheric carbon dioxide levels taken at 
Mauna Loa, Hawaii, since the 1950s. “On any 
given day, I step back,” he says, “and am still sort 
of amazed that it’s all out in the water and most 
of it’s working” = 


Basin 
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Panel tackles ‘compassionate use’ 


Companies pressured by social-media appeals seek fair way to allocate last-ditch treatments. 


BY SARA REARDON 


ancy Goodman wanted to spend as 

| \ | much time as possible with her dying 
child. But even as ten-year-old Jacobs 

brain cancer worsened, Goodman spent 
months contacting pharmaceutical companies 
that were developing drugs that might help him. 
“Compassionate-use’ laws in the United 
States allow pharmaceutical companies to 
provide unapproved drugs to patients in des- 
perate need, but many firms provide little or 
no information on how to request these treat- 
ments. They are often reluctant to supply drugs 
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in response to such pleas, especially if drug 
stocks are limited, although media campaigns 
on behalf of individual patients can sometimes 
embarrass firms into providing unapproved 
treatments. Anecdotes suggest that money and 
connections are also influential. 

Now, ethicists and medical experts are 
testing what they hope is a fairer system to 
distribute drugs in short supply. The approach, 
presented on 6 June at the American Society of 
Clinical Oncology meeting in Chicago, Illinois, 
is inspired by the method used to prioritize 
organ transplants. In a test case, research- 
ers worked with Janssen Pharmaceuticals to 
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determine how to distribute limited supplies of 
daratumumab, an experimental drug intended 
to treat multiple myeloma. 

The 10-person panel combed through 
76 anonymized applications to determine 
how likely the drug was to work for each per- 
son, ultimately approving 60. “It’s hard to say 
no, because people die,” says Arthur Caplan, a 
bioethicist at New York University’s Langone 
Medical Center who is leading the effort. But 
he says that a systematic approach could help 
companies to make unbiased decisions. 

In Goodman's case, six of the eight compa- 
nies that she contacted never responded. The 
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other two declined to give her son their drugs 
because the treatments had never been 
tested in children. Jacob died in 2009, and his 
mother went on to found the advocacy 
group Kids v Cancer in Washington DC. 

There are many legitimate reasons that com- 
panies might refuse to provide unapproved 
drugs, says Aaron Kesselheim, who studies 
health-care ethics at Brigham & Women’s 
Hospital in Boston, Massachusetts. People who 
request such treatments are often very ill, and 
companies worry that their deaths while receiv- 
ing the drug would reduce the compound’s 
chances of approval from the US Food and 
Drug Administration (FDA). Giving patients 
access to experimental drugs could also dis- 
courage them from enrolling in controlled tri- 
als that might assign a placebo, and would leave 
less drug available for use in the trial. 

“These requests are some of the most 
difficult decisions I face as a physician,’ says 
Amrit Ray, chief medical officer of Janssen in 
Titusville, New Jersey. “It's a trade-off we have 
to consider carefully.” 

Since 2014, 28 US states have enacted 
‘right-to-try’ laws, which allow companies 
to provide drugs to patients without involv- 
ing regulators. Caplan calls these “feel-good” 
laws, because the FDA approves most of the 
compassionate-use requests that it receives. (It 
is not clear how many applications are denied 
by companies and never reach the FDA.) 

Vickie Buenger, president of the advocacy 
group Coalition Against Childhood Cancer 
in Philadelphia, Pennsylvania, says that right- 
to-try statutes contribute to patients’ misun- 
derstanding about the factors that go into a 
decision to supply or deny access to a drug. “It 
implies that companies and the FDA are either 
angels of mercy if they come through, or devils 
who have no compassion if they withhold it” 

This lack of clarity, and poor communication 
by companies, has led many patients and their 
families to launch social-media campaigns to 
secure unapproved drugs. 

Perhaps the most famous case came in 2014, 
when the family of seven-year-old Josh Hardy 
began a Facebook campaign for an unap- 
proved antiviral drug called brincidofovir to 
treat a life-threatening infection. Its manufac- 
turer, Chimerix of Durham, North Carolina, 
had declined, on the grounds that giving the 
drug to Josh — and any subsequent peti- 
tioners — would leave less of the compound 
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Josh Hardy received an experimental drug after his family launched a massive social-media campaign. 


available for an ongoing clinical trial. Within 
days, the Facebook page and Twitter campaign 
#savejosh were featured on national television. 
Chimerix quickly created a small clinical trial 
with Josh as its first patient. 

“Every single CEO woke up the next morn- 
ing and said, ‘Oh my gosh, that might happen 
to me’, says Elena Gerasimov, who directs 
a programme at Kids v Cancer that helps 

parents of children 


“Theserequests — with cancer to peti- 
are some of the tion companies for 
most difficult drug access. (The 
decisions I face FDA is attempting 


asaphysician.” to make this process 
easier. On 2 June, 
it released new forms to simplify the filing of 
compassionate-use appeals.) 

Former Chimerix chief executive Kenneth 
Moch says that dozens of companies have since 
enlisted him as an adviser on such issues. His 
advice is simple: every company should create 
a transparent system to handle compassion- 
ate-use requests, guided by the FDA. That is 
in line with the advice of the Biotechnology 
Innovation Organization, an industry group 
in Washington DC that encourages its mem- 
bers to develop clear policies to explain whether 
they provide expanded access and to help phy- 
sicians to request drugs. “That's the least we can 


do, to facilitate people being able to contact us,” 
says Kay Holcombe, the group’s senior vice- 
president for science policy. 

Caplan and Ray plan to test their system on 
another treatment later this year — possibly 
a mental-health drug or a childhood vaccine. 
Caplan hopes that more companies will adopt 
the approach, and imagines eventually creating 
a compassionate-use consulting panel to aid 
small companies. 

Moch cautions that the approach might not 
be appropriate for every drug or company, but 
he likes how it helps to level the playing field. 
“Had Josh been a 37-year-old guy who kicked 
his dog and smoked, he wouldn't have gotten 
the same support as a lovely seven-year-old 
boy,’ he says. 

Patient advocates also support Caplan’s sys- 
tem for distributing drugs. “Putting it in the 
hands of people who understand the drug’s 
possibilities is a reasonable thing,” Buenger says. 

But many also want the FDA to create incen- 
tives for companies to provide drugs for com- 
passionate use. Until that happens, or until 
companies adopt programmes such as Caplans, 
social-media campaigns and other public 
appeals may be some patients’ only option. “Td 
do it,” Goodman says. “Td do anything to save 
my kid — anything to give Jacob a few more 
months.” = 
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Many miners in Peru’s Madre de Dios region use mercury to extract gold from sediments. 


ENVIRONMENTAL SCIENCE 


Peru’s gold rush 
raises health fears 


Gold-mining boomin southeastern Amazon is driving high 
levels of mercury contamination. 


BY BARBARA FRASER 


ong-running concerns about the 
[ener effects of gold-mining in 

the Peruvian Amazon came toa head on 
23 May. Peru’s government declared a 60-day 
public-health emergency in an attempt to 
address the problem of mercury pollution 
caused by unregulated gold-mining along the 
Madre de Dios River. 

Health-care and emergency workers are now 
providing medical and food aid for 25 affected 
villages, after a flurry of studies showed high 
levels of mercury in people, fish and sediments 
in the Madre de Dios region. The government 
estimates that some 48,000 people across 
85,301 square kilometres have been affected. 

“We now know with certainty what the 
source of the exposure is,” says Peru’s deputy 
health minister, Percy Minaya. “We are not 
going to solve this in two months, or even in 
a year, but the health ministry has to start” 
Symptoms of mercury poisoning include 
vomiting and diarrhoea. Extreme cases can 
lead to brain or kidney damage. 

The Madre de Dios region has a long 
history of small-scale alluvial gold-mining, 
but the rise in international gold prices in the 
past decade has brought a boom in the activity. 
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Peru's National Institute of Statistics and Infor- 
mation in Lima reported on 23 May that gold 
production for March in Madre de Dios was 
1,583 kilograms, up 28% on the same month 
last year. 

The region’s miners extract the gold by 
sluicing sediment to separate out gold-bearing 
sand, which they then mix with mercury to 
form an amalgamated lump of metal. Heating 
the lump vaporizes the mercury, leaving pure 
gold behind. The process sends an estimated 
30-40 tonnes of mercury each year into water- 
ways, where bacteria convert the metal into 
methylmercury. The methylmercury accumu- 
lates in fish, which are a key source of food for 
people in the Madre de Dios region. 

Perhaps unsurprisingly, researchers have 
found high levels of mercury (above the max- 
imum recommended by the World Health 
Organization) in hair samples from 40% of 
the Madre de Dios residents that they tested. 
The team, from Duke University in Durham, 
North Carolina, examined about 800 people 
who live along a major highway in the region, 
100 people who live beside the river and 2,000 
in the Amarakaeri Indigenous Reserve. 

Some communities in the region are closer 
to the gold-mining activities than others, but 
the 40% exposure rate held across the highway, 
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river and reserve, says study leader William 
Pan, an epidemiologist with Duke’s Global 
Health Initiative. 

The presence of mercury in human hair usu- 
ally indicates that a person has been exposed 
to the metal through a dietary source. Pan says 
that the Duke studies in Madre de Dios show 
a strong correlation between human mercury 
exposure and fish consumption. Since 2009, 
research by Pan’s group (S. E. Diringer et al. 
Environ. Sci. Processes Impacts 17, 478-487; 
2015) and by tropical ecologist Luis Fernandez 
at the Carnegie Institution for Science at Stan- 
ford University in California, have found high 
mercury levels in some species of fish, particu- 
larly in large catfish and fish that eat other fish. 


BURDEN REDUCTION 

Peru’s government used the Duke team’s 
latest study to decide which riverside com- 
munities should receive the emergency aid. 
Officials are trying to help affected resi- 
dents to replace the high-risk fish in their 
diets with other sources of protein. During 
the emergency period, the government will 
give food, including canned ocean fish, and 
multivitamins to combat anaemia, to roughly 
15,000 of the 48,000 people affected. 

Pan says that these steps should reduce the 
body burden of mercury in people who also 
cut their consumption of contaminated fish, 
because the primary route of mercury expo- 
sure in the region seems to be through food. 

The government is also considering whether 
its food aid should include supplies of the grain 
quinoa. Preliminary data from Duke's house- 
hold surveys in the Madre de Dios region show 
a correlation between quinoa consumption 
and lower mercury levels. 

Minaya says that the government’s long- 
term plan also includes helping communities 
to establish fish farms. The emergency period 
is set to end days before a new president takes 
office on 28 July. But Minaya is confident that 
the next administration will continue to moni- 
tor and address the mercury pollution prob- 
lem, despite opposition from regional and local 
government officials. 

These officials have criticized the emergency 
decree, arguing that the link between people's 
mercury levels and fish consumption is not 
proven. The officials are also worried that the 
public-health emergency could harm tour- 
ism in the nearby Mant National Park and 
Tambopata National Reserve. 

Because of the growing concerns over mer- 
cury exposure, Fernandez is leading a project 
at Wake Forest University in Winston-Salem, 
North Carolina, to study the metal’s effects on 
human and environmental health in the Ama- 
zon. As director of the Center for Amazonian 
Scientific Innovation at Wake Forest, Fernan- 
dez will lead a team of US researchers who are 
collaborating with colleagues at the Peruvian 
Amazon Research Institute and the National 
Amazonian University of Madre de Dios. m 
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HUMAN GENETICS 
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Ambitious plan to synthesize 
human genome unveiled 


Some admire project to make genome from scratch; others say it hasn’t justified its aims. 


BY EWEN CALLAWAY 


roposals for a large public-private 
Pp initiative to synthesize an entire human 

genome from scratch — an effort that 
could take a decade and require billions of 
dollars in technological development — were 
formally unveiled on 2 June, almost a month 
after they were first aired at a secretive meeting. 

Proponents of the effort, named “Human 
Genome Project—Write’ (HGP-write), write 
in Science that US$100 million from a range of 
funding sources would help to get their vision 
off the ground’. The team is led by synthetic 
biologist Jef Boeke at New York University; 
genome scientist George Church at Harvard 
Medical School in Boston, Massachusetts; and 
Andrew Hessel, a futurist at the commercial 
design studio Autodesk Research in San Rafael, 
California. 

But the idea — which essentially aims to 
develop technologies that reduce the cost of 
DNA synthesis — has not met with universal 
excitement among researchers. To some, the 
proposal to create a human genome is praise- 
worthy for its ambition and sheer chutzpah: at 
present, only tiny bacterial genomes and a por- 
tion of a yeast genome have been made from 
scratch. But others feel that HGP-write repre- 
sents a needless centralization of work that is 
already taking place in companies trying to 
lower the price of synthesizing strings of DNA 
(see ‘Falling costs’). Some of HGP-write’s propo- 
nents have financial stakes in those firms, which 
include Gen9 in Cambridge, Massachusetts. 

“My first thought was ‘so what;” says Martin 
Fussenegger, a synthetic biologist at the Swiss 
Federal Institute of Technology in Zurich. “I 
personally think this will happen naturally. It's 
just a matter of price at the end” 

Others think that the project should be 
delayed until its leaders can win broader sup- 
port for the idea. In an e-mail sent to reporters, 
synthetic biologist Drew Endy, at Stanford Uni- 
versity in California, and religion scholar Laurie 
Zoloth, at Northwestern University in Evanston, 
Illinois, say that the HGP-write team has not 
properly justified its aims, and that the project 
should be abandoned. “We are still waiting for 
aserious public debate with participation from 
a broad range of people,’ they say. 

Endy and Zoloth had already questioned the 
scientific rationale for synthesizing a human 


genome in May, when HGP-write was first 
aired at an invitation-only meeting at Harvard 
University that was attended by more than 100 
scientists, entrepreneurs, lawyers and ethicists. 
The closed nature of the meeting also attracted 
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The price of ‘writing’ DNA is falling more slowly 
than the price of reading it. 
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criticism: Church told the health and medicine 
news service Stat that this was because the 
paper describing the effort was under embargo. 
“There was a lot of confusion on the day about 
what was going on,’ says Tom Ellis, a synthetic 
biologist at Imperial College London. 

The three-page announcement of HGP-write 
fills in some detail. It notes that current tech- 
nologies are both too expensive and too primi- 
tive to synthesize the 3-billion-base-pair human 
genome. The team calls for a series of pilot 
projects, including synthesizing much shorter 
segments of the genome and making slimmed- 
down chromosomes to do specific tasks, to 
make its eventual goal doable. The whole pro- 
ject should require less than $3 billion (the 
price of the publicly funded Human Genome 
Project), the researchers say. 

“T think it’s a brilliant project,’ says Paul 
Freemont, a structural biologist at Imperial 
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College London. “If you want to do this, it’s 
going to be on the same scale as the Human 
Genome Project, it’s going to need some big 
funding agencies and hundreds and hundreds 
of researchers around the world” 

Ellis and others worry that a centralized 
project that explicitly focuses only on the 
human genome might needlessly narrow the 
products of the effort. But Boeke — who in 
2014 reported synthesizing a 270,000-base- 
pair yeast chromosome’ — brushes that objec- 
tion aside. He envisions HGP-write eventually 
producing synthetic genomes from mice, 
microbes and all sorts of organisms. 

“Tangible products may be slow to follow 
at first, but writing DNA more cheaply and 
at large scale will make researchers more effi- 
cient and comprehensive in their work, leading 
to practically unlimited potential for indirect 
products,’ adds Danielle Tullman-Ercek, a 
biochemical engineer at the University of 
California, Berkeley. 

Cheaper DNA synthesis isn’t the only thing 
that stands in the way of writing a human 
genome that could function inside a cell. Ellis 
says that there are no methods for inserting 
very large pieces of DNA into a mammalian 
cell and making them function normally, and 
researchers have little clue how to design a com- 
plex genome that has anything more than trivial 
changes to an existing one. 

Freemont worries that commercial bodies 
such as DNA-synthesis companies may stake a 
claim on the project. “I think it’s good if this is an 
open, publicly funded initiative,” he says. 

Boeke would prefer that there be no intellec- 
tual-property restrictions on the products of 
HGP-write, as is the case with his synthetic yeast 
genome project. But, he says, the “chances are 
good” that companies involved in HGP-write 
will be granted such rights “to get the job done”. 

In the Science article, the HGP-write team 
says that it will seek broader public buy-in 
before beginning work. Boeke says that much 
of the Harvard meeting centred on the ethics 
of the project. And he says that the synthetic 
cells will be engineered to make reproduction 
impossible. “We're not trying to make an army 
of clones or start a new era of eugenics. That is 
not the plan.” m 
1. Boeke, J. D. et al. Science http://dx.doi. 


org/10.1126/science.aaf6850 (2016). 
2. Annaluru, N. et al. Science 344, 55-58 (2014). 
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UK loses track 
of its policy 
research 


Report slams government 
for missing studies. 


BY DANIEL CRESSEY 


r Vhe UK government doesn’t know 
how much policy-linked research it 
has commissioned, or how much of 

such research has been published. That is the 
stark conclusion of an independent inquiry, 
published on 2 June, which details confusion 
about the status of research produced for 
government departments in areas ranging 
from social policy to climate change. 

The inquiry was carried out by Stephen 
Sedley, a judge, law professor and trustee 
of Sense About Science, the London-based 
science-advocacy group that published the 
report. He spoke to government advisers, 
civil servants and researchers, and used 
multiple freedom-of-information requests 
to find out how much research commis- 
sioned by the government gets published. 

According to official estimates, the 
government spends around £2.5 billion 
(US$3.6 billion) a year commissioning 
research linked to policy issues. But, Sedley 
says, it has “no comprehensive account” of 
how much is commissioned or published. 

Just 4 out of 24 government departments 
told Sedley that they kept a centralized 
database of commissioned research. Others 
could not provide a list of the studies that 
they carried out or commissioned. Many 
departments said that it would be too costly 
to provide the information, because it was 
held in many different files and locations. 

Civil servants told Sedley that they often 
waste time trying to find past studies. The 
report also notes several cases in which the 
publication of reports has been delayed 
owing to “political concerns about the 
implications of the research” — including 
work on drug policy and immigration. 

“The fact that a few departments do 
maintain a research register, handle 
awkward findings and publish promptly 
exposes the excuses of those that don’t” 
said Tracey Brown, director of Sense About 
Science, in a statement. The report calls 
for a central register of all government- 
commissioned research, a commitment 
to prompt publication, and routine publi- 
cation of any work that has been used to 
inform government policy. m 
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The late rock-art specialist Mike Morwood at Liang Bua cave, where his team discovered Homo floresiensis. 


PALAEOANTHROPOLOGY 


Hobbit relatives 
hint at family tree 


Possible ancestors of Homo floresiensis found after long hunt. 


BY EWEN CALLAWAY 


ore than a decade after the discovery 
Mie a diminutive relative of modern 

humans once lived on the Indone- 
sian island of Flores, Gerrit van den Bergh 
was losing faith that he would find any clues 
to the ancestors of the ‘hobbit’ It was Octo- 
ber 2014, and for four years he had co-led an 
industrial-scale excavation near the cave where 
the metre-tall skeleton had been found. Then, 
weeks before packing it in for the year, a local 
worker found a 700,000-year-old molar. More 
teeth and a partial jaw quickly followed. 

“We had given up hope we would find any- 
thing, then it was ‘bingo!;” says van den Bergh, 
a palaeontologist at the University of Wollon- 
gong, Australia, whose team reports the finds 
in two papers in this issue (G. D. van den Bergh 
et al. Nature 534, 245-248; 2016; and A. Brumm 
et al. Nature 534, 249-253; 2016). “We had this 
enormous party. We had a cow slaughter and 
there was dancing. It was marvellous.” 

The unusually petite jaw and teeth are from 
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at least one adult and two children — the 
first possible ancestors of Homo floresiensis 
ever to be discovered — and resemble the 
hobbit remains found on the island, which are 
between 60,000 and 100,000 years old. 

The jaw and teeth address two questions that 
have dogged the study of the species — where 
did it come from and how did it get so small? 
Butas with all things hobbit, there is little con- 
sensus among researchers, who say that firm 
conclusions require more fossils. 

The hobbit’s discovery in 2003 in Liang Bua 
cave, by a team led by the late Australia-based 
rock-art specialist Mike Morwood, was an 
instant sensation. But its place in the human 
family tree is contentious. Morwood’s team 
proposed that it was a shrunken Homo erectus, 
the same species that probably evolved into 
Homo sapiens in Africa and that roamed as 
far as Europe and Asia. Other scientists who 
have examined features of H. floresiensis, such 
as its long, flat feet, think that it descended 
from a smaller, more primitive human relative 
such as Homo habilis or even Australopithecus, 
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known only from remains in sub-Saharan 
Africa. 

Seeking the hobbit’s ancestors, in 2004, 
Morwood’s team returned to a site 74 kilome- 
tres from Liang Bua called Mata Menge, where 
elephant bones and tools had been found in the 
1960s. The dig started small, but in 2010 the 
team scaled up. Bulldozers cleared an area of 
2,000 metres square, and more than 100 locals 
then dug for 6 days a week using chisels and 
hammers. They found hundreds of stone 
tools, thousands of fossils from animals such 
as crocodiles, rats and komodo dragons, but 
no hominin bones. 

By then ill with advanced prostate cancer, 
Morwood visited the area for the last time in 
2012. “He really made an effort to walk through 
the site, you could see he was in pain, but he was 
so detailed-minded,’ van den Bergh says. “He 
increased the pressure to dig more holes and go 
faster. He really wanted to find them” 

Morwood, who died in 2013 before the teeth 
and jawbone were found, is an author on the 
Nature papers, which were co-led by scientists 
based in Japan, Australia and Indonesia. 

The team concludes that the jaw excavated at 
Mata Menge is from an adult (its wisdom tooth 
had erupted) who was even smaller than the 
hobbit, and that two canines are the milk teeth 
of two different children. The thin jaw looks 
more like that of H. erectus and H. floresiensis 


than the beefier jaws of more primitive homi- 
nins such as H. habilis. The square-shaped 
teeth are intermediate between H. erectus 

and H. floresiensis. 


“We had given One tooth and the 
up hope we rock around it led the 
would find team to estimate that 
anything, then the remains are some 


700,000 years old. 
The oldest artefacts 
in the region, meanwhile, suggest that a group 
of Homo erectus arrived on Flores about one 
million years ago, says van den Bergh. 


it was ‘bingo!’” 


DWARFED BY DIET 

He and his team note that the remains point to 
large-bodied H. erectus as the likeliest ances- 
tor of the hobbit, and propose that it shrank 
in just a few hundred thousand years to cope 
with the meagre resources on Flores. Elephants 
and other large creatures have been known to 
shrink over time to cope with the lack of food 
typical of islands, and red deer on the island of 
Jersey in the English Channel shrank to one- 
sixth of their original size in just 6,000 years, 
says van den Bergh. 

Both Fred Spoor, a palaeontologist at Uni- 
versity College London, and palaeoanthro- 
pologist Chris Stringer at London’s Natural 
History Museum agree that H. erectus is now 
the best fit for the hobbit’s ancestor, although 


IN FOCUS | NEWS 


Stringer isn’t so sure that the shrinkage 
happened on Flores. It’s just as likely that the 
hobbit emerged on another island, such as 
Sulawesi, and then moved to Flores, he says. 

But William Jungers, a palaeoanthropolo- 
gist at Stony Brook University in New York, 
says that the fossils are not complete enough 
to favour the H. erectus origin: “I don't believe 
these scrappy new dental specimens inform 
the competing hypotheses for the origin of the 
species one way or another.” 

A small river that leads down a hill deposited 
the sandstone in which the teeth and jaw were 
found, and van den Bergh expects that more 
hominin remains lie there. His colleagues, 
meanwhile, have found stone tools in Sulawesi, 
north of Flores. For once, the prospect of more 
hobbits isn’t looking so bleak. m SEE EDITORIAL P.151 
AND NEWS & VIEWS P.188 


CORRECTION 

In the News Feature ‘South Korea’s Nobel 
dream’ (Nature 534, 20-23; 2016), one 
paragraph incorrectly gave numbers 

in billions instead of trillions of won. In 
fact, 63.7 trillion won was spent on R&D, 
49.2 trillion of which came from private 
enterprise; and 11.2 trillion won was spent 
on basic research. 
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A brown cloud of 

pollutants hangs 

over the outskirts 
of New Delhi. 
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|BY MEERA SUBRAMANIAN 


India’s capital scrambles to tackle its 
epic pollution problems. 


n winter nights, New Delhi burns with innumerable fires. Flames flicker along 

pavements and street corners, where the destitute huddle to stay warm and 

cook their suppers, while night watchmen stand guard next to their own small 

blazes outside private homes. The rising plumes of smoke mingle with exhaust 

and dust stirred up by overloaded trucks that rumble down roads blanketed in 

fog. The mixture melds into a nearly opaque substance that leaves a metallic 
taste on the tongue. Overhead, there is not a single star to be seen. 

With dawn comes a hint of warmth, but the sunlight remains hidden by haze. A hopelessly 
optimistic sign — “Make Delhi Pollution-Free” — is lashed to a metal cage that protects a young 
sapling, its withered leaves caked with dust. 

The grime is the most obvious part of the pollution that plagues India’s capital region and its 
25 million people. Less discernible are the airborne particles smaller than 2.5 micrometres in diam- 
eter, known as PM,,; — the most harmful size range. Just a fraction of the diameter of a human hair 
and astoundingly aerodynamic, PM, can penetrate deep into the body, reaching the recesses of the 
lungs. The particles are a nasty amalgam of pollutants both natural and increasingly anthropogenic, 
generated from sources within the city’s boundaries and hundreds of kilometres away. The World 
Health Organization (WHO) declares that no amount of this pollutant is safe to breathe. 

Two years ago, Delhi had the highest PM, ; levels of 1,600 cities surveyed by the WHO. Last 
month, in an updated and expanded inventory’, Delhi retained its status as the most polluted 
of the world’s largest cities, with an annual PM, ; average of 122 micrograms per cubic metre 
(gm ~*) — three times the permitted Indian standard and greatly exceeding the WHO standard 
of 10 ugm~. The pollution, which comes mainly from combustion of wood, coal, gas, diesel and 
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crop residue, is worst in the winter, when wood-burning peaks and 
cold-weather inversions trap pollutants close to the ground and cause 
spikes in the daily average of above 600 ug m ’. Late last year, the levels 
prompted the Delhi High Court to declare the city a “gas chamber”. 

The observed PM, amounts are estimated to cause as many as 
16,000 premature deaths and 6 million asthma attacks’ in Delhi annu- 
ally, shaving around 6 years off the life expectancy of city residents’. 
Although the WHO data have brought attention to Delhi, the problem 
is global: according to the agency, particulate pollution affects more 
people than any other pollutant on Earth. 


AIR PATROL 
Delhi, like Beijing, Mexico City, London and Los Angeles, is struggling 
to reduce pollution, even as its population swells. Researchers and the 
government are trying to construct a detailed breakdown of the differ- 
ent pollution sources, and authorities are experimenting with ways to 
mitigate the damage, from restricting when people can drive to shutting 
down power plants. But India faces unique challenges. Its population 
is concentrated in the north, an area geographically prone to pollution, 
and its people have aspirations for development. There is a growing 
middle class hungry to own cars, and one-fifth of the population merely 
wants access to basic electricity. Those facts threaten to compromise 
Delhi's efforts to improve environmental quality. 

“The reality is that the pollution in Delhi is very complex. There are 
a lot of sources. It varies from season. It varies by time of day. It varies 
by neighbourhood,’ says Namit Arora, a member of the pollution task 
force of the Delhi Dialogue Commission, a government initiative in the 
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city. But he insists that the city can make progress. “We can act, and we 
need to act, on multiple fronts simultaneously,” 

Delhi is trying to do just that. Long before it was saddled with the 
mantle of having some of the world’s worst air, the city and the Supreme 
Court of India took several steps to alleviate pollution. Vehicle emissions 
came down in the early 2000s, thanks to decisions to remove lead from 
petrol, improve vehicle-emission standards and pull old commercial 
vehicles off the roads. Around the same time, the city implemented 
a monumental conversion of the public-transportation fleet, includ- 
ing buses and the city’s zippy three-wheel auto-rickshaws, away from 
gasoline and diesel engines to ones fuelled by cleaner compressed natu- 
ral gas. In 2002, the Delhi Metro subway system opened its first line, 
improving public-transportation options. All but two of the coal-based 
thermal power plants in the city were converted to natural gas, and many 
industries, including brick kilns, were moved beyond Delhi’s bounds. 

These efforts reaped big gains, yet they have been offset by the incessant 
growth of the megacity. Since 2000, Delhi’s population has nearly doubled. 
And the number of vehicles has almost tripled, from 3 million to more 
than 9 million, according to the government. 

Determining what creates hazardous PM,, is a crucial step in 
reducing it, yet past studies have varied markedly because they have 
used different methods and relied on limited data. Filling some of 
the gaps is a report released in January by the Indian government and 
the Indian Institute of Technology (IIT) Kanpur’, which took a more 
comprehensive approach to investigating the causes of Delhi’s poor air 
quality. Some sources contribute throughout the year, such as pollu- 
tion from vehicles, diesel generators, construction dust, biomass and 
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coal combustion and industries. Others are seasonal: dry summer 
dust blowing in from nearby deserts, autumn crop burning and Diwali 
holiday fireworks, and the warming fires that make the city glow come 
winter (see ‘Poison stew’). 

Vehicle emissions are constant, and with all those belching tailpipes 
in sight every day, they often capture the attention of pollution-fighting 
officials. In January, Delhi implemented an odd-even programme that 
allowed car owners to drive only every other day, as dictated by their 
vehicle's number plate. When the 15-day experiment came to a close, 
a few hundred Delhiites took to the streets to praise the initiative and 
rally for continued efforts. Families, musicians, passionate teenagers, 
costumed 20-somethings and activists all gathered near Jantar Mantar, 
a cluster of monuments built in the 1700s to study skies that were then 
crowded with visible stars. But over the temporary stage set up for the 
event, an air monitor showed that PM, ; levels were hovering around 
184 ug m”, a level that warrants staying indoors to reduce exposure. 

Although levels remained well above acceptable during the odd-even 
experiment, several researchers declared it a success in both lowering 
emissions and, perhaps more importantly, raising public awareness. 
“People are willing to display the civic sense we thought didn’t exist 
here,’ says Arora, who was at the demonstration wearing a shirt embla- 
zoned with the words “Help Delhi Breathe” and an image that was half 
flower bloom, half gas mask. 

Still, government officials and researchers admit that this approach 
has limited long-term potential because of how difficult it is to enforce 
the ban. During January's trial, people were already talking about buying 
a second car as a workaround. After the odd-even effort was repeated 
in April, the government estimated that there were half a million more 
vehicles on the roads than during January’s trial, according to local 
media, which suggests that people were skirting the rules. 

The odd-even policy gets a lot of attention, “but that is not the solu- 
tion’, says Ashwani Kumar, who was secretary of the environment of 
the Delhi Pollution Control Committee (DPCC) at the time. “A typi- 
cal democratic society cannot depend on a don'ts’ approach,’ he says. 
Making it more expensive and inconvenient to drive a car, for example, 
would naturally spur people to use public transportation. “It has to be 
based on incentive and disincentive; otherwise, it’s easy to find loopholes 
and descend into a quagmire of corruption” 


GOING PUBLIC 

Although parts of Delhi’s public-transportation system are impressive, 
others are lacking. The Delhi Metro is an efficient, extensive electric- 
rail system with more than 200 kilometres of lines, and it continues to 
expand. But the Indian media has reported that the government's plans 
to increase the number of buses in Delhi have been plagued by delays, 
and that a pilot programme for dedicated bus lanes was met with so 
much public resistance that the lanes are now being dismantled. 

“If the public-transportation system is robust, and it’s made safe 
and comfortable and reliable, people will automatically switch,” 
says Sarath Guttikunda, director of the independent research group 
UrbanEmissions.info, which is registered in Delhi. 

Nudging people onto buses and subways will deal with only part of 
the transportation pollution problem. Although vehicles as a group con- 
tribute up to about one-quarter of PM, , in Delhi’, the fraction generated 
by heavy-duty freight trucks is twice that of cars. To ease the impact of 
the estimated tens of thousands of trucks that move through the city 
daily, the Supreme Court has implemented new taxes on them, and 
Delhi is adding bypass highways. 

The other crucial ingredient is the type of fuel that goes into vehicles. 
Given that diesel engines produce much more particulate matter than 
ones that run on petrol, the rising percentage of luxury diesel cars is a 
troubling trend. To try to stem the sales, the government temporarily 
banned registration of diesel vehicles with larger engines earlier this 
year, according to Indian media. 

Delhi's emissions standards for vehicles are more stringent than those 
in the rest of India, but they still lag far behind those in Europe. And 
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the discrepancy between city and national standards means that many 
vehicles operating in Delhi spew emissions from lower standard vehicles 
and fuel obtained beyond city limits. 

Yet all this attention on vehicles is somewhat misplaced because they are 
not the biggest source of particulate pollution. “Ifthe goal is to reduce PM, 
we need to go beyond traffic,’ says atmospheric scientist Pallavi Pant of 
the University of Massachusetts-Amherst, who studies Delhi’ air quality. 

Just a few steps away from the busy roads, ona broken stretch of pave- 
ment, a woman tends a cooking pot perched on three stones, a wood 
fire burning below. Across South Asia, more than one-quarter of the 
outdoor air pollution comes from these traditional stoves®. In urban 


Ever-present construction and 
smoke-belching vehicles are major 
contributors to air pollution. 
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As it seeks cleaner air, Delhi 
has to confront the desperate 
desire for development. 


Delhi, only about one in ten households still relies on smoky stoves that 
use wood, dung or kerosene, but they still contribute a substantial part 
of the city’s PM, ; burden. One study found’ that the emissions from 
fires used domestically for cooking and warmth rival those from the 
electricity sector, from brick kilns and from industry. 

Delhi's government and the country’s Supreme Court have tried to 
limit some of these sources, but there is one major factor that they can- 
not control: the city’s location, far from the cleansing breezes of the 
ocean. From the west, smoke from agricultural crop burning and dust 
storms from the Thar Desert blow into the city. And from the north, 
cold fronts sweep down from the Himalayas, locking pollutants in with 
winter weather inversions. When it comes to air pollution, the city is 
“geographically disadvantaged’, says M. P. George, an environmental 
scientist with the DPCC. 

Although Delhi can’t fend off desert dust storms, it can control 
construction dust generated by its never-ending building spree. Officials 
are attempting to decrease air pollution from construction by improving 
and enforcing rules, such as requirements to cover building sites and 
trucks to stop dust from blowing away. 

The problems that Delhi faces also afflict other cities in the region, 
creating a vast brown cloud that in satellite images seems to smother 
much of South Asia. About one-third of Delhi’s particulate pollution 
comes from sources outside the city, according to the ITT-Kanpur study’. 

“Tt’s not just a Delhi problem; it’s a regional problem,’ says Milind 
Kandlikar, who researches development and environmental issues at the 
University of British Columbia in Vancouver, Canada. India’s northern 
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cities have come to dominate the WHO’ list of most-polluted cities, 
with half of the top 20 all located in the region. Delhi's efforts will falter 
unless the rest of the country also steps up, says Kandlikar. “Everybody 
else has to be involved. This is not going to go away easily” 

But political conflicts are threatening the chances for broad-scale 
pollution-control efforts. There is currently a fractious tension between 
the national government, run by Prime Minister Narendra Modi’s 
Bharatiya Janata Party, and Delhi’s ruling Aam Aadmi Party — a situ- 
ation that hinders efforts to develop a unified strategy to deal with air 
pollution. In lieu of any cooperation, the Supreme Court continues to 
have an integral role in improving air quality by directing the govern- 


Poison stew 


BB Vehicles Ml Secondary particles* ~~ Burning of wood, dung and agricultural 
waste for cooking or heating 


Winter 259% 30% 


9% 
Summer 


15% 12% 7% 26% 


odd-even schemes, and the Delhi government has teamed up with the 
University of Chicago’s Delhi-based academic centre to launch a design 
competition called the Urban Labs Innovation Challenge: Delhi, which 
aims to crowdsource ideas for improving air and water quality. In March, 
it received hundreds of submissions, including ideas for promoting roof- 
top solar panels and creating viable alternatives to burning waste and 
crops. Prize money of up to US$300,000 will fund design pilots. 

In Delhi and around the world, citizens, governments and researchers 
are all demanding more air-quality data, which indicates an interest in 
knowing the enemy. Information from government monitors is pub- 
licly available, but the interfaces are often clunky. This has given rise 


Delhi has the highest particulate air-pollution readings of any megacity. A study released this year 
by the Indian Institute of Technology Kanpur‘ found that different sources dominate in winter and 
summer for particles smaller than 2.5 micrometres, known as PM2.5. 
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ment to restrict emissions from sources such as vehicles and power 
plants. “My reading is that the politicians are often grateful to the courts 
for acting, because many of these measures are unpopular,’ says Arora. 

With so many sources of air pollution, city officials would ideally like 
to know which programmes will work best — so the Delhi Dialogue 
Commission task force recruited IBM Research-India to develop com- 
puter models to forecast pollution levels. 

“The idea is to create a simulation framework which will allow them 
to evaluate different policies and see their impact before they actually 
implement them,’ says Ashish Verma, a senior manager with IBM’s 
Smarter Planet team in Delhi. Although the modelling is still in pro- 
gress, early results reinforce how important the weather is. This suggests 
that air-pollution controls might be most effective if they take advantage 
of specific weather conditions, for example, by closing power plants or 
limiting traffic during the worst pollution-trapping weather inversions. 


CITIZEN ACTION 

As it seeks cleaner air, Delhi has to confront the country’s desperate desire 
for development, which includes providing electricity to the roughly 
240 million people who still lack it. India pledged to expand its renewable 
energy capacity aggressively, as part of the national plan that it submitted 
during the United Nations climate-treaty negotiations last year. But the 
plan also defends India’s right to use fossil fuels, stating that “in order 
to secure reliable, adequate and affordable supply of electricity, coal will 
continue to dominate power generation in future”. 

Both the climate plan and government officials such as Kumar insist 
that India need not follow conventional modes of development that rely 
on fossil fuels. “There is no way we can adopt the technology trajectory 
of the developed countries who continue to be the biggest polluters in 
terms of per capita,’ Kumar says. But it is not clear whether India will be 
able to leap-frog past the most polluting forms of energy. 

One encouraging sign in Delhiis that air quality is now part of an active 
city-wide conversation. Full-page newspaper ads solicit citizen input on 


to new alternatives such as IndiaSpend, an independent media outlet 
that also conducts ‘sensor journalism, in which readings from its own 
pollution monitors are made available in a user-friendly format. The 
sheer increase in global monitoring — the WHO's database doubled in 
two years because many more cities had begun to monitor their air — 
offers an opportunity for more-comprehensive studies going forward, 
including in regions that have previously been neglected. 

Meanwhile, concerns about pollution are hard to escape. In upscale 
Delhi markets, vendors hawk air masks and purifiers, and parents can 
purchase nebulizers decorated with cute animals to appeal to children 
with asthma. Doctors have been known to advise patients with lung 
ailments to leave the city. “Picking up a life is not so easy,’ pleads one 
mother. “What are we supposed to do?” 

Many Delhiites tend to swing between despair and hope — just as 
the skies go through their own cycles. By late March, the weather starts 
to shift and the winds pick up. It’s as if the city’s windows are thrown 
open, allowing fresh breezes to blow through. July brings the monsoon 
rains and washes much of the danger out of the sky, down towards 
the Yamuna River, which carries some of the burden away. For a few 
months, citizens can step into the night, tilt their heads back and once 
again enjoy the stars overhead. = 


Meera Subramanian is a journalist in Cape Cod, Massachusetts, and 
is the author of A River Runs Again: India’s Natural World in Crisis, 
from the Barren Cliffs of Rajasthan to the Farmlands of Karnataka. 
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SANDY HUFFAKER 


part from the treadmill desk, Pieter Dorrestein’s office at the 
University of California, San Diego (UCSD), is unremark- 
able: there is a circular table with chairs around it, bookshelves 
lined with journals, papers and books, and a couple of plaques 
honouring him and his work. 

But Dorrestein likes to offer visitors a closer look. On his 
computer screen, he pulls up a 3D rendering of the space. Four figures 
seated around the table — one of whom is Dorrestein — look as if 
they’ve been splashed with brightly coloured paint. To produce the 
image, researchers swabbed every surface in the room, including the 
people, several hundred times, then analysed the swabs with mass 
spectrometry to identify the chemicals present. 

The picture reveals a lot about the space, and the people in it. Two of 
Dorrestein’s co-workers are heavy coffee drinkers: caffeine is splotched 
across their hands and faces (as well as ona sizeable spot on the floor — a 
remnant of an old spill). Dorrestein does not drink coffee, but has left 
traces of himself everywhere, from personal-care products to a com- 
mon sweetener that he wasn't even aware hed consumed. He was also 
surprised to find the insect repellent DEET on many of the surfaces 
that he had touched; he hadnt used the chemical in at least six months. 

Then there were signatures of the office's other inhabitants: the 
microbes that reside on human skin. Dorrestein has been using mass 
spectrometry to look at the small molecules, or metabolites, produced 
by these microbes, and to get a clearer picture of how microorganisms 
form communities and interact — with other microbes, with their 
human hosts and with the environments that they all inhabit. 

He has analysed microbial communities from plants, seawater, 
remote tribes, diseased human lungs and more, in an effort to listen in 
on their chemical conversations: how they tell one another of good or 
bad places to colonize, or fight over territory. The work could identify 
previously unknown microbes and useful molecules that they make, 
such as antibiotics. 

“The applications are broad,’ says Katie Pollard, a comparative 
genomicist at the Gladstone Institutes at the University of California, 
San Francisco. Because many microbes cannot be cultured and studied 
directly, she explains, “these approaches that assay them in situ are 
totally game-changing”. They also directly address some of the main 
goals outlined in the US$521-million National Microbiome Initiative, 
announced by the White House’s Office of Science and Technology 
Policy last month. Dorrestein was present for the announcement. 

In this fast-moving field, Dorrestein has set himself apart by building 
useful tools and productive collaborations. “Pieter is genuinely interested 
and very creative,’ says Janet Jansson, division director of biological sci- 
ences at Pacific Northwest National Laboratory in Richland, Washington. 
In April, she visited UCSD, and Dorrestein asked whether he could swab 
her hand for one of his studies. “I said, ‘Oh! I want to do that! I want to 
be involved in that study!” Jansson recalls: “It’s interesting and exciting 
science that people want to participate in” 


ROCK AND ROLL 

Dorrestein grew up in the Netherlands, and became obsessed with rock- 
climbing when he visited family friends in Tucson, Arizona, at the age 
of 16. Faced with the flatness of his homeland, he applied to Northern 
Arizona University in Flagstaff, in large part because of its proximity to 
the many stone towers of the Four Corners region, where Arizona meets 
New Mexico, Colorado and Utah. He studied geology and chemistry, 
but intended to pursue his passion for climbing. Shortly after graduating 
in 1998, however, an experience on the 900-metre-tall face of El Capitan 
in Yosemite, California, made him think again. 

He was clinging to the rock about 50 metres above his last anchor- 
ing point, and realized that if he were to lose his grip, he would drop 
100 metres before his safety line tautened and 
slammed him into the granite. It wasn't fear, he 
says, but rather his lack of it that troubled him. 
“I thought, if 1 keep doing this, it won't be a 
good ending,” he recalls. “So I rappelled down” 


Pieter Dorrestein’s 
methods could reveal 
what microbes do in 
complex communities. 
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He drove home to Flagstaff that day, and started filling out 
applications to graduate school. He ended up at Cornell University in 
Ithaca, New York, studying how microbes produce small molecules 
such as vitamin B1. It was here that he was first introduced to mass 
spectrometry. 

Mass spectrometry generally involves breaking complex molecules 
apart, ionizing them and measuring the mass of the resulting fragments, 
which can be used to calculate the composition of the starting mol- 
ecules. Dorrestein uses the analogy of a bar code — mass spectrometry 
creates a unique identifier for each chemical in a sample. 

Spurred by his interest in the technology, he went on to do a post- 
doc in the lab of Neil Kelleher, a chemical biologist at the University of 
Illinois at Urbana-Champaign. Kelleher was pioneering efforts to do 
‘top-down mass spectrometry, in which intact, rather than digested, 
proteins are put directly into the mass spec. The approach allows 
researchers to identify small modifications made to proteins, but the 
process is slow. Within two months of his arrival in Illinois, Dorrestein 
had developed a speedier approach that allowed him to examine cer- 
tain large enzymes systematically’. “We boiled down years of work into 
days, basically,’ Dorrestein says. He ended up co-authoring 17 papers 
in 2 years. “Pieter has that unusual combination of creativity and drive, 
along with an incredible ability to finish projects,’ says Kelleher, who is 
now at Northwestern University in Evanston, Illinois. 

Dorrestein joined the faculty at UCSD in 2006 — but things really 
kicked off for him when Palmer Taylor, then dean of the university’s 
school of pharmacology, authorized the purchase of a MALDI-TOF 
mass spectrometer (matrix-assisted laser desorption/ionization time 
of flight), which would allow Dorrestein to do mass-spectrometry 
imaging. “That changed the whole world around, he says. 


SPACE CRUSADERS 

As well as identifying molecules in a sample, mass-spectrometry 
imaging provides spatial information. MALDI-TOF uses a laser to heat 
up and ionize molecules. By scanning that laser across a 2D sample, 
researchers can capture an ‘image’ that shows exactly where different 
molecules in the sample reside. The technique can be used to iden- 
tify and locate biomarkers in slices of tumours, but with his interest 
in microbes, Dorrestein wondered whether he could take colonies of 
bacteria on a Petri dish and scan them directly to see the metabolites 
they produce. 

No one had ever tried it. Dorrestein suspects that they were afraid of 
getting their expensive mass spectrometers dirty — “and this is as dirty 
as it comes, putting microbes directly into the instrument”. So he tried 
a simple experiment, asking an undergraduate student, Sara Weitz, to 
scan a colony of Bacillus bacteria. 

The images generated “weren't the prettiest’, Dorrestein says, but 
they indicated that the process worked. He sent them to Paul Straight, a 
microbiologist who had just joined the faculty at Texas A&M University 
in College Station. “I'm pretty sure his jaw dropped,’ Dorrestein says. 
Together, the two teams used mass-spectrometry imaging on colonies 
of Bacillus subtilis and Streptomyces coelicolor grown next to one another. 
By exploring the spaces where the colonies interacted, they were able to 
identify molecules that the microbes use to compete with each other’. 

Actually visualizing this microbial arms race, Dorrestein says, makes 
him think back to 1928, when Alexander Fleming isolated penicillin 
from a mould that was killing bacteria on a dish. Mass-spectrometry 
imaging could quickly reveal the chemistries of such interactions, and 
perhaps speed up the search for new antibiotics. 

Dorrestein decided to shift his lab to focus almost exclusively on these 
methods. He was still an early-career investigator, and almost everybody 
he knew discouraged him from taking such a big risk. But Taylor pushed 
him to apply for tenure right away. “Pieter’s potential to think outside 
the box in the analytical and computational arenas was immediately 
evident,’ Taylor says. “His research took off very rapidly.” 

The problem with looking at dirty samples is that they produce 
messy data. Scanning microbial landscapes produces thousands of bar 
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codes, but it’s largely unknown what 
they correspond to; they haven't been 
annotated. “It’s the equivalent of looking 
under the lamp post,’ Dorrestein says: 
one can only ‘see’ the molecules that 
have been identified before, and the 
vast majority haven't. This is currently a 
big challenge for the field, says Jansson. 
“It’s possible to analyse features by mass 
spec, but still very difficult to identify 
what those features are.” 

To help to make sense of the heaps 
of data, Dorrestein worked with Nuno 
Bandeira, a computational biologist at 
UCSD, on an approach that classifies 
bar codes and the molecules to which 
they correspond according to their 
relationships with other annotated mol- 
ecules’. This allows researchers to start 
predicting, computationally, the structures and functions of thousands 
of metabolites. But there's still a dearth of annotation: although thou- 
sands of people worldwide conduct mass-spectrometry research, most 
annotate only the few molecules that they’re interested in. 

So, beginning in 2014, Dorrestein and graduate student Mingxun 
Wang from Bandeira’s lab started to develop a way to crowdsource 
annotation. They launched the Global Natural Product Social Molecu- 
lar Networking website, a repository and data-analysis tool that enables 
researchers to uncover relationships between related molecules, group 
similar ones together and compare data sets. “This is something he’s 
brought to the field that has really helped,’ says Jansson. 


TEAM WORK 

One of the keys to Dorrestein’s success has been his collaborations. Rob 
Knight, a leader in microbiome DNA and RNA sequencing, works just 
across the quad from Dorrestein’s office. They've teamed up to blend 
sequencing with mass spectrometry. Last year, a postdoc in Dorrestein’s 
lab, Amina Bouslimani, took swabs from one male and one female vol- 
unteer, at 400 spots on their bodies — twice. One swab from each spot 
went to Knight's lab so that the microbes in it could be sequenced, and 
the other went for mass spectrometry to identify the chemicals, natural 
and artificial, that coexist with the microorganisms. 

The participants had refrained from showering or using cosmetics for 
three days, but the chemical signatures from the hundreds of different 
types of microbe in the samples were overwhelmed by chemicals from 
beauty and hygiene products’. Still, the researchers did find correlations 
between microbe communities and local chemistries: for example, the 
bacteria found in the vaginal area were correlated with molecules asso- 
ciated with inflammation. Such connections, Dorrestein says, could be 
used to generate hypotheses about host-microbe interactions. 

Bouslimani is now analysing samples from volunteers’ hands and 
from personal items such as their mobile phones. The work, which has 
not yet been published, has shown that people leave persistent chemical 
signatures on the objects that they touch — like those in the image of 
Dorrestein's office. 

Bouslimani and Dorrestein think that this could have applications in 
forensic science. A suspect could be swabbed to determine whether the 
chemical signature of his or her skin matches that at a crime scene. Or in 
the absence of DNA or fingerprint evidence, the chemicals that a crimi- 
nal leaves behind could help to provide a lifestyle profile: a composite 
sketch of the products that they use and the mixture of microbes they 
carry. “Maybe the chemical signature can help the investigator narrow 
down who was there,” says Bouslimani. 

Last year, Dorrestein teamed up with microbiologist Maria 
Dominguez-Bello of New York University and several others who 
wanted to see what human skin and its microbial diversity look like 
when people grow up free of the trappings of the developed world. They 
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Caffeine appears as coloured spots in a 3D visualization. 


collected samples from some remote 
tribes — one near Manaus, Brazil, and 
Tanzania’s Hadza people — and com- 
pared them with swabs from non-tribal 
people near the collection sites. Using 
Dorrestein’s mass-spectrometry tech- 
niques, they’ve found that people in 
the tribes have more-diverse microbial 
communities and skin chemistry than 
those living a more modern lifestyle. 
The ongoing work is serving up some 
surprises too, says Dorrestein. People 
from one village in Brazil had a range 
of pharmaceuticals on their skin, indi- 
cating that they had more contact with 
outsiders than previously suspected. 

Dorrestein has a way of leaning 
forward and almost standing on his 
toes in excitement when he talks about 
the technology and how it might help to assess the health of oceans, or 
improve efficiency in agriculture, a major contributor to greenhouse-gas 
emissions. But when asked how he chooses projects to pursue, it’s work 
on human health that he mentions first. “To us, that’s a really obvious, 
direct application of this — we want to help patients,” he says. 

Dorrestein teamed up with Knight, Doug Conrad — director of 
UCSD’s adult cystic fibrosis clinic — and others to develop a rapid 
microbial diagnostic test. Cystic fibrosis causes a build-up of mucus 
in the lungs, which can periodically become infected with bacteria. 
These infections require aggressive treatment with antibiotics — and 
sometimes the bacteria can develop resistance. Dorrestein and his col- 
laborators have shown’ how analysing mass-spectrometry data ona 
phlegm sample from someone with cystic fibrosis can identify microbial 
communities that standard medical culturing techniques miss. 

Louis-Félix Nothias-Scaglia, a postdoc who joined Dorrestein’s lab 
this year, is mapping the skin of people with psoriasis, a condition 
thought to be triggered by an overactive immune system. If molecules 
produced by certain bacteria are present when the condition flares up 
but not when the skin is healthy, Nothias-Scaglia explains, they might 
point to drugs that could treat or even prevent the disease. Even being 
able to use microbial changes to predict when a flare-up is coming 
would enable patients to reduce their use of immune-suppressing drugs. 

Turning such data-intensive techniques into standard lab tests will 
be a challenge. “Cynics would say it’s too complicated, it’s never gonna 
go anywhere,’ says Conrad. “To a certain extent, I can understand that. 
But that’s a good way to keep going the way things are” 

Dorrestein definitely wants to change the way things are, particularly 
for the blossoming field of microbiome research. He views the disci- 
pline as passing through phases: the first has centred on determining 
the identity of microbes. The second phase is working out what they’re 
doing, using techniques such as mass spectrometry. 

What drives the establishment of these communities? What metabolic 
processes are under way, and how do they interact with each other and 
with a host? “If you fundamentally understand that,’ Dorrestein says, 
“you can start to take control of it” And that’s the third phase, he says — 
taking control. By monitoring microbial communities, is it possible to 
add the necessary ingredients to change a person’s health, their mood, 
their athletic performance? Dorrestein thinks that the answers to these 
questions are right in front of him. He just has to look a little closer. m 


Paul Tullis is a freelance journalist in Los Angeles, California. 
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Common compliance situations can get good researchers into 
trouble, warn James M. DuBois and colleagues. 


gramme that no scientist wants to list on 

their CV. Participants are referred to it by 
their home institutions, usually after having 
their research privileges suspended. Several 
times a year, a small group of researchers 
arrives in St Louis, Missouri, for a course that 
we designed to help participants regain their 
research privileges. We are grant-funded fac- 
ulty members with backgrounds in psychol- 
ogy and research ethics. The programme 
— initially called RePAIR, now called the 


IE 2013, we launched a training pro- 


Professionalism and Integrity Program, or 
PI Program — was developed with funding 
from the US National Institutes of Health, 
and participants pay a fee to attend. 

About half of our participants are enrolled 
after a failure of oversight that resulted in the 
publication of false data or faulty consent. 
Other common reasons include plagiarism 
and documentation problems. Researchers 
are often referred on multiple grounds (see 
‘Why attendees enrolled’; and Supplemen- 
tary Information; go.nature.com/25ao1sd). 
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Many arrive convinced that they have 
been misjudged. One said, “The programme 
sounds like it’s for criminals — not me.” 

Three intense days later, attitudes change. 
Participants generally express gratitude 
for the programme. A year later, follow-up 
surveys indicate that the vast majority have 
changed how they work. We, the programme 
instructors, have also adjusted our research 
practices: we now appreciate how easy it is 
to run afoul of rules when busy schedules 
collide with complex projects. > 
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> Before arriving at our small group 
workshop, participants complete a battery 
of assessments. On the final day, they write 
a professional-development plan, which 
outlines strategies to, say, hold regular team 
meetings, seek further training on compli- 
ance rules, or restructure workloads. Over 
the next three months, we conduct an aver- 
age of three coaching telephone calls with 
alumni as they implement their plans. 

We began our programme not knowing 
who would attend, or even whether what 
we were doing was a good idea. Three years 
in, were convinced that it is worthwhile. 
We have now trained 39 researchers from 
24 institutions. Researchers in our pro- 
gramme do not display personality traits that 
are distinct from the general population of 
scientists. We believe that most researchers 
may be susceptible, and that the busiest ones 
are most likely to err. 


WHAT WENT WRONG? 

There are high-profile cases of serial 
fraudsters who have consciously built their 
careers on fabricated data and who, some 
research suggests, have personality disor- 
ders’. We do not encounter such individu- 
als in our programme. In general, we work 
with talented faculty members who seek to 
do good research and whom institutions 
wish to retain. 

To understand what led to these individu- 
als’ mistakes, we draw on several data sources. 
Over the past two years, we have surveyed 
participants on factors such as their knowl- 
edge of research rules, job-related stress, how 
much an individual uses cognitive distortion 
to justify compliance breaches’, and their 
use of strategies that improve professional 
decision-making’. We compared their sur- 
vey scores with those of a national sample of 
400 researchers funded by the US National 
Institutes of Health. Our participants did 
not differ from this 


sample’s mean scores “Our 
on any measures Programme 
thought to be con- participants 
nectedtocompliance were highly 
or research integrity. successful 

We also pro- researchers.” 


filed participants’ 

work-related attributes. The Clifton 
StrengthsFinder is a test to identify employ- 
ees’ strongest talents’. It is widely used by 
organizational psychologists in corporate 
settings, and its validity and reliability has 
been documented in diverse professions. 
Most of our participants had top scores in 
2 of 34 possible areas: achiever and learner. 
This is not surprising, given that all partici- 
pants had doctoral degrees and had been in 
full-time employment in research institu- 
tions. However, test results showed a dearth 
of compliance-related talents, such as focus, 
discipline and consistency. 
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WHY ATTENDEES ENROLLED 


Frequent reasons behind researchers’ referrals 
to the Professionalism and Integrity Program 
(many are referred for more than one reason). 


Failure to provide oversight, leading to problems 


e@ececece 
49% 


Consent violation concerning human 
research participants 


@ee 
TT 31% 
Plagiarism 


e@ @ 
ee! 21% 


Inappropriate recruitment of human 
research participants 


@ 6 
5 18% 


Animal-care violation 


@ «¢ 

0 
a4 15% 
Data fabrication, falsification or substandard 


research leading to false data 


Tr 13% 


Although we do not have a national 
comparison sample for this test, a meta-anal- 
ysis suggests that most scientists would share 
such a profile. Among the most common 
traits associated with scientific creativity are: 
being less conventional, being more open to 
new experiences, and ambitiousness?. 

During day two of the workshop, 
participants share with the group what led 
to their referral. Assurances of confidential- 
ity and the presence of others in similar situ- 
ations mean that participants are markedly 
forthright. We reflected on their stories and 
other information gathered to contemplate 
why they found themselves in trouble. For 
most, we identified multiple causes (see 
‘Why researchers stumbled’). 

Three causes played a part in most cases: 
paying too little attention to details or over- 
sight; being unsure about relevant rules; and 
not prioritizing compliance. All these could 
be attributed to other, more basic causes. For 
example, many participants provided too 
little oversight of their teams because they 
were overextended or understaffed. People 
sometimes were unsure of rules after mov- 
ing into a new area of research. They also 
encountered regulations that had grown 
more complex since they completed their 
training. 


THREE MYTHS ABOUT MISCONDUCT 

Our experience with programme partici- 
pants challenges several misconceptions 
about misconduct. 
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Only bad apples get into trouble. We do 
not want to minimize the seriousness of 
participants’ violations. Institutions face 
large fines for non-compliance with federal 
rules; deficient research hurts the quality of 
scientific literature, and human and animal 
subjects may be put at risk. Nevertheless, 
participants’ infractions rarely resulted from 
a conscious intent to mislead or break rules. 

Often, programme participants had in 
their research tried to ensure that the funda- 
mental purpose of compliance was satisfied 
— for example, that research participants 
understood and freely consented to study 
conditions, that animals were kept pain free, 
or that data supported conclusions. They did 
so, however, in ways that fell short of full com- 
pliance. As one participant put it: “Prior to 
this situation, I tried to follow the spirit of the 
law — now I try to follow the letter of the law.’ 

Though our participants were not differ- 
ent from other researchers in terms of their 
knowledge of rules, knowledge deficits some- 
times played a big part in their situations. 
Most researchers function fairly well within 
complex regulatory systems, but problems 
arose when they entered new territory — 
for example, running their first clinical trial 
involving drugs or devices subject to regula- 
tions of the Food and Drug Administration. 

The role of culture in research compliance 
is rarely discussed. For example, in the United 
States, plagiarism is taken much more seri- 
ously than violation of rules about how credit 
for work is assigned, but such common values 
are rarely expressed explicitly. Research-ethics 
programmes often attach the same signifi- 
cance to all rules, and may leave some scien- 
tists ignorant of unspoken norms. 

A slight majority of our participants came 
to the United States from elsewhere; roughly 
double the expected proportion on the basis 
of the demographics of the US scientific 
workforce. (Our previous studies indicate 
that nation of birth is a stronger predictor 
of scores on key measures than is nation of 
training.) Although many participants were 
accustomed to US compliance requirements, 
they sometimes employed or mentored peo- 
ple who were not. Some of these junior lab 
members made poor assumptions about 
appropriate processes that participants failed 
to correct. For example, one participant did 
not review a postdoc’s data and analyses 
because he felt doing so would imply mistrust. 
We suggested that the participant frame his 
reviews of lab members’ work as modelling 
appropriate behaviour, explicitly telling post- 
docs that they would also be expected to pro- 
vide this kind of oversight if they headed a lab. 


Scientific skills are enough to do good 
science. The idea behind the StrengthsFinder 
assessment is that people tend to be strong in 
some areas but not in others. Scientists have 
long had to master skills associated with, for 


instance, artists (creativity) and accountants 
(attention to detail). In an increasingly com- 
petitive environment, requirements for both 
talents have increased, and scientists must 
also have the good communication and nego- 
tiation skills often associated with politicians. 
Building teams with complementary skills is 
ideal, but it is not always easy or even possible 
when resources are limited. 

How researchers communicate with 
others (team members and institutional 
officials) matters greatly. Communica- 
tion should be clear, balanced and non- 
threatening, especially when an institution 
questions a researcher’s compliance. For 
example, informing institutional officials 
that your grants pay their salaries is rarely 
effective in resolving concerns. 

When situations demand skills that a team 
does not have, compensating strategies may 
be helpful’. For example, standard operating 
procedures (SOPs) or checklists can facilitate 
the consistent performance of compliance 
tasks — such as getting approval for each 
animal-research protocol — even when tasks 
or team members are new, or when people 
are tired. The principal investigator who 
creates SOPs for compliance and data integ- 
rity also sends the message that these are as 
important as the science itself. Even with 


WHY RESEARCHERS STUMBLED 


SOPs, principal investigators must actively 
oversee compliance activities and ensure that 
staff are adequately trained, competent and 
diligent. Regular meetings can help. Prin- 
cipal investigators are accountable for the 
integrity of their research projects — they 
must find ways to assure it even when they 
themselves are not detail-oriented. 


The more publications and grants the 
better. By the metrics that institutions use to 
reward success, our programme participants 
were highly successful researchers; they had 
received many grants and published many 
papers. Yet, becoming overextended was 
a common reason why they failed to ade- 
quately oversee research. It may also have 
led them to make compliance a low priority. 
People who are too busy must triage, and 
what scientist wants to prioritize checking 
patient signatures above data gathering? 
Principal investigators should protect 
themselves and their labs by taking on no 
more projects than they can responsibly 
oversee and adequately staff. In general, to 
do more, researchers need more resources 
— space, trained staff and mentoring when 
moving into new areas. A clear lesson from 
our programme is that compliance requires 
individual integrity as well as departmental 


Instructors on the Professionalism and Integrity Program assessed underlying causes 


(often more than one) for researchers’ lapses. 


Proximate cause 


Lack of attention 


Ultimate cause of researcher lapse 


Overextended, not detail-oriented or 
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and institutional support. Scientists become 
overextended in part because their institu- 
tions value large numbers of projects. 


GOOD INVESTMENT 

Before its launch, the programme was 
criticized on principle: why allocate 
resources to assist rule breakers to regain 
their research privileges? 

We think these are resources well spent. 
Questionable research practices are much 
more widespread than we would like to 
believe®. Following the workshop, our par- 
ticipants demonstrate more positive attitudes 
toward compliance, improved problem- 
solving skills and better lab-management 
habits. Proactive, preventive training is often 
recommended as a way to boost research 
integrity, but there is little evidence that 
increasing one-size-fits-all training changes 
behaviour. Certainly, sending someone back 
to repeat a standard course after a finding of 
misconduct seems unlikely to help’. 

In our view, intense, individualized training 
following a breach can be remarkably effec- 
tive. And it is unquestionably much more 
cost efficient than letting problems fester until 
even bigger problems arise for investigators 
and institutions*. Our participants have gone 
from analysing their own lapses to custom- 
izing solutions, such as holding more face-to- 
face meetings or developing SOPs. 

Our experience with the course has made 
us more compassionate to the participants, 
and more cautious about our own behav- 
iours. The message that we want to send 
is this: unless you are careful, it could hap- 
pen to you. Learning to do compliance well 
ensures data integrity and protects human 
and animal subjects; it also protects the 
careers of researchers and their colleagues. m 


James M. DuBois is director of the 

Center for Clinical & Research Ethics at 
Washington University School of Medicine, 
St Louis, Missouri, USA. John T. Chibnall 
is professor of psychiatry, Raymond Tait is 
professor of psychiatry and vice-president of 
research, and Jillon Vander Wal is professor 
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CANCER THERAPY 


Defining stemness 


Hans Clevers admires an analysis of stem-cell science 
that sharpens up some of the fuzziness in the field. 


the concepts and definitions that we 
use in the stem-cell field. Some of the 
arguments seem circular; observation and 
assumption are not well separated. I once 
asked a colleague for their best definition 
of a stem cell. The answer: a cell that can 
self-renew. What, then, is self-renewal? The 
immediate reply: what stem cells do. 
Fuzziness in stem-cell concepts and def- 
initions has significant consequences. It 
affects how we design, conduct and inter- 
pret experiments, how we communicate 
our discoveries and, ultimately, how we 
design therapies aimed at supporting the 
regenerative capacity of healthy stem cells 
or eradicating those that fuel the growth 
of tumours. Despite these concerns, as an 
experimentalist I could never put my finger 
on where exactly scientific common sense 
is failing. 
Enter Lucie Laplane and her book 


[& always felt uncomfortable about 
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Cancer Stem Cells. 
Trained as a science 
philosopher, Laplane 
also spent time at the 
bench in two stem- 
cell labs. Her book 
is the culmination of 
a six-year effort to 
describe and struc- 
ture the philosophi- 
cal underpinnings of 
stem-cell science. In 


Cancer Stem 


Cells: Philosophy 
and Therapies 


Fasciola addition to absorb- 
Harvard University ‘ ‘ally all th 
Press: 2016. ing essentially all the 


relevant experimental 
literature — historical and scientific — she 
interviewed some of the leading inter- 
national stem-cell researchers and clini- 
cians. She discussed her emerging insights 
with fellow philosophers and science his- 
torians. Starting from an interest in can- 
cer stem cells (CSCs), the book, despite 
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its title, builds a 
much broader 
framework for 
understanding the 
biology of stem cells of all types. 

Central to CSC theory is the observation 
that not all tumour cells are equal. The bulk 
of a tumour consists of short-lived prolif- 
erative cells and differentiated cells. But 
some tumour cells seem to be the malignant 
equivalents of tissue stem cells. Much as nor- 
mal stem cells maintain healthy organs by 
producing new tissue cells, CSCs drive the 
persistence of malignant tumours by pro- 
ducing new cancer cells. 

CSC theory tacitly assumes that CSCs 
carry the armaments of normal stem cells: 
they are built to last a lifetime; are resilient 
to many kinds of chemical or physical insult; 
and can ‘slumber’ for prolonged periods. 
CSCs would thus be capable of surviving 
chemotherapy and radiation, explaining why 
local recurrence is the almost inevitable out- 
come of such treatment. And metastases that 
sometimes appear many years after the exci- 
sion of a primary tumour would be caused 
by quiescent CSCs that have wandered to 
distant sites. Thus, CSC theory explains 
why cancer patients can never be consid- 
ered cured, even when a treatment outcome 
seems encouraging. Most important, CSC 
theory promises the development of inno- 
vative treatments, aimed not at reducing the 
bulk of a tumour, but at taking out its ‘beat- 
ing heart’: the cancer stem cells. 

Laplane starts her comprehensive over- 
view with a description of how the popular- 
ity of CSC theory has exploded over the past 
two decades, driven by the rapid develop- 
ment in cell-sorting technology. She then 
gives an insightful historical account start- 
ing from nineteenth-century giants includ- 
ing Theodor Schwann and Rudolf Virchow. 
Almost as an aside, she describes the work 
of Leroy Stevens and Barry Pierce on trans- 
plantation of cells from teratocarcinomas, 
eerie growths that can contain any type of 
tissue, including teeth and hair. This eventu- 
ally led to the discovery and ‘domestication’ 
of their healthy physiological counterparts, 
the embryonic stem cells that generate all tis- 
sue types in early embryos. 

She discusses, too, how Canadian stem- 
cell biologist John Dick revived the field 
in the 1990s by developing ways to study 
the behaviour of different types of human 
leukaemia cell by transplanting them into 
mice. He merged this with modern insights 
into the nature of the healthy stem cells that 
give rise to all types of blood cell (haema- 
topoietic stem cells). 


Astem cell seen under 
a scanning electron 
microscope. 


He discovered that DNATURE.COM 
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rather than producing normal blood cells, 
they continuously produce leukaemia cells. 
Thus, Laplane concludes that the definition 
and study of CSCs is inseparable from that 
of normal stem cells. 

Laplane defines a stem cell as one “capa- 
ble of self-renewal and of differentiation”, 
where self-renewal is the ability to recreate 
one copy of itself on division. Laplane reveals 
how for some time the field was describing 
two very different entities as CSCs: the cells 
from which tumours originate; and those 
inside tumours that drive their growth in the 
long term. Apples and oranges. The field has 
also wrestled with the gold-standard assay 
introduced by Dick: the transplantation of 
sorted human cancer cells into mice, where 
the cells that grow out into tumours are con- 

sidered CSCs. It is 
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Books in brief 


Complexity: The Evolution of Earth’s Biodiversity and 

the Future of Humanity 

William C. Burger PROMETHEUS (2016) 

Botanist William Burger conducts a grand tour of life’s complexity, 
emphasizing cooperation and symbiosis in evolutionary history. He 
segues deftly from the towering success of beetles and bacteria to 
the formation of new species and the distribution of biodiversity. The 
story culminates with humanity’s cognitive and cultural hegemony. 
But however ascendant we are as a species, Burger dispassionately 
notes, our explosive global population growth and overuse of 
resources mirror the behaviour of locust swarms. 


“Fuzziness arguable what this 
in stem-cell ‘surrogate’ assay 
concepts has actually measures. 
significant Does the out- 
consequences, growth of heavily 
but I could never Roney can- 
q cer cells in mice 
put my finger on 
where scientific really reflect the 
common sense is behaviour of the 
failing.” same cells in the 


original tumour? 
Laplane acknowl- 
edges these issues. In a second edition, she 
may want to touch on how researchers use 
genetic marking of stem cells to trace their 
derivatives through solid tissues. 

Laplane’s rigorous analyses unveil deep 
semantic and conceptual problems in the 
field. She arrives at a framework of four 
possible versions of ‘stemness’: two intrin- 
sic, two extrinsic. She suggests that it can be 
categorical (an intrinsic property of a stem 
cell, independent of its environment); dispo- 
sitional (an intrinsic property of a stem cell 
that emerges only in the right environment); 
relational (an extrinsic property induced ina 
cell that would otherwise be a non-stem cell 
by its microenvironment); or systemic (an 
extrinsic property of a system such as tissue, 
rather than an individual cell). 

I suspect that there is no current consen- 
sus on where to fit even the best-studied 
stem-cell types into this framework. Yet 
Laplane’s stemness framework should be of 
great value. It will help to clarify definitions 
and concepts, even if it only provides solid 
ground from which to disagree. Moreover, 
the framework can readily be applied to 
experimentation. A philosopher may indeed 
have straightened out the stem-cell field. m 


Hans Clevers investigates stem cells and 
cancer at the Hubrecht Institute in Utrecht, 
the Netherlands. He is also the research 
director of the national paediatric oncology 
hospital Princess Maxima Center in Utrecht. 
e-mail: h.clevers@hubrecht.eu 


Can Neuroscience Change Our Minds? 

Hilary Rose and Steven Rose POLITY (2016) 

The seepage of neuroscience into economics and policy should 

be deeply questioned, argue sociologist of science Hilary Rose and 
neuroscientist Steven Rose in this crisp, astringent analysis. In a 
historically and scientifically contextualized critique of this 
“data-rich and theory-poor” discipline, they examine claims made 
for the US and European ‘big brain’ projects, and for the findings that 
feed into UK policy on child-rearing and early education. Ultimately, 
they aver, neuroscience can indeed change our minds — but social 
and political understanding of the issues must be factored in. 


What a Fish Knows: The Inner Lives of Our Underwater Cousins 
Jonathan Balcombe FARRAR, STRAUS AND GIROUX (2016) 

More than 30,000 species of fish — about half of all vertebrates — 
roam global waters. And as ethologist Jonathan Balcombe notes in this 
engrossing study, breakthroughs are revealing sophisticated piscine 
behaviours. Balcombe glides from perception and cognition to tool 
use, pausing at marvels such as ocular migration in flounders and the 
capacity of the frillfin goby (Bathygobius soporator) to memorize the 
topography of the intertidal zone. Yet, he argues, the over-exploitation 
of wild stocks, notably of apex predators such as tuna, points to the 
need for change on moral as well as ecological grounds. 


Drive! Henry Ford, George Selden, and the Race to 

Invent the Auto Age 

Lawrence Goldstone BALLANTINE (2016) 

Historian Lawrence Goldstone follows the momentous patent war 
that ended in 1911, when George Selden’s case for a patent on a 
“road carriage” powered by internal combustion was broken by arch- 
industrialist Henry Ford, who adapted existing technology to craft the 
wildly successful Model T. Goldstone weaves in accounts of European 
innovators such as Karl Benz, and road races such as the 1907 
Peking-to-Paris dash. But as the market-savvy maverick who “did not 
so much create demand as anticipate it”, Ford dominates the story. 


Silent Sparks: The Wondrous World of Fireflies 

Sara Lewis PRINCETON UNIVERSITY PRESS (2016) 

The pulsing glow of massed fireflies is a nocturnal wonder of nature. 
Biologist Sara Lewis has spent decades studying these beetles of 
the family Lampyridae, which spans nearly 2,000 species. Here she 
expounds on firefly metamorphosis, courtship, reproduction and 
bioluminescence — from the exquisite anatomy of the Photinus 
firefly’s lantern to the chemical ‘light switch’ that enables flash 
control. (A field guide to North American fireflies is included.) An 
illuminating peek into a fascinating corner of field biology. Barbara Kiser 
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Soldiers benefit from — and are subject to — huge amounts of research. 


Among the warriors 


Sharon Weinberger finds much to amuse and disturb in 
Mary Roach’s tour of conflict’s wilder shores. 


n Grunt, Mary Roach reveals herself as the 
kind of writer you want at a nap-inducing 
military press conference. In a culture 
where dead civilians are ‘collateral damage’ 
and air strikes are ‘kinetic operations, Roach 
has a way of peeling back the euphemisms to 
get to some of the true horrors of war. Unbe- 
lievably, she often manages to make it funny. 
The book, a tour of the scientific world of 
“humans at war’, sees Roach uncovering the 
Medicare reimbursement code for maggots 
and learning how combat medics practise 
treating an evisceration (the stand-in for 
faeces-filled intestines involves dyed oatmeal 
and worse). To find humour in the carnage of 
war, a writer must walk a fine line, and most 
of the time Roach does so deftly. Her previ- 
ous works published by W. W. Norton have 
focused on sex (Bonk, 2008), space travel 
(Packing for Mars, 2010), cadavers (Stiff, 
2003) and other subjects ripe for snappy one- 
liners. War is more challenging. Yet even her 
chapter on genital injuries — urotrauma — 
strikes an appropriate balance. It is at once 
harrowing, fascinating and depressing. 
Roach is at her best capturing the rational 
absurdities of the US military, an institution 
at perpetual war. For instance, there is a task- 
group dedicated to the “hook-and-loop fas- 
tener” (preferred by snipers over the noisier 
Velcro, which can reveal their position). Ifthe 
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devil is in the details, then war is all devils: 
“Only a military clothing designer's portfolio 
would include a mitten that accommodates a 
lone forefinger in firing position” 

Roach’s writing has been criticized as 
superficial, but that is not its greatest weak- 
ness here. Rather, there is something disturb- 
ing about approaching military science as if it 
were all so awesome. A public-affairs officer 
is “likeable”, a medical researcher “gorgeous”. 
Who knew that everyone in and around the 
military was so “droll and adorable”? Roach 
may never have seen a military representa- 
tive stonewalling a legitimate inquiry, as hap- 
pened when the military faced allegations 
of neglect at the Walter Reed Army Medical 
Center in Washington DC in the 2000s. Or 
perhaps she is hesitant 
to admit it, because 
that is not her style. 
More worryingly, 
Roach’s happy world 
of military science 
does not include cases 
in which the mili- 
tary treated people as 
human guinea pigs, 


Grunt: The A ae 3 diati 

Curious Science of SUC as In radiation 
Humans at War experiments at the 
MARY ROACH height of the cold war. 


W. W. Norton: 2016. Roach does give 


© 2016 Macmillan Publishers Limited. All rights reserved. 


a glimpse of the military’s underbelly. Her 
chapter on the implications of diarrhoea as 
an impediment to combat readiness takes her 
to the secretive Camp Lemonnier in Djibouti. 
This passage says much about the US mili- 
tary, which allowed a writer interested in 
loose stools onto this drone-operations hub, 
but has been reluctant to let in national- 
security reporters. She corners one of the 
base’s mysterious “special operators”; he 
reveals little about operations, but speaks 
freely on his bowel movements. 

There are some significant gaps. Roach 
does not, she concedes, cover post-trau- 
matic stress disorder (PTSD), “not because 
PTSD doesnt deserve coverage, but because 
it has so much, and so much of it is so very 
good”. That is a shame: Roach’s unique 
voice might have done great things for the 
mouse models that military researchers use 
to study psychological trauma (imagine a 
very timid mouse being menaced by a large 
one bred for aggression). It’s also troubling 
because PTSD, along with traumatic brain 
injury, is an important part of current mili- 
tary science. Among other subjects, the 
military is looking at the links between 
traumatic brain injuries and dementia (see 
Nature 477, 390-393; 2011). 

Also missing are the central questions that 
make military science so fascinating. For 
example, how can the exigencies of national 
security advance knowledge in areas such 
as trauma medicine, yet create decades of 
controversy in others, such as nonlethal 
weapons? Descriptions of wounds inflicted 
by roadside bombs in Iraq and Afghanistan 
are Grunt’s strongest passages, but the later 
chapters on the science of humans under 
water and the perils and benefits of flies 
on the battlefield jump around. It’s hard to 
escape vertigo as we skip from penis trans- 
plants to shark repellant. Roach misses an 
opportunity to examine how military sci- 
ence has morphed along with war. 

The most telling part of Grunt comes early 
in a chapter on heat, when Roach notes that 
“genetic differences in thermoregulation’ are 
important “given our seemingly permanent 
posture of fighting extremism in the Middle 
East”. The medical science of war has never 
been static, but a constant reflection of where 
and how the military fights. Here, Roach has 
inadvertently gone to the heart of the perspec- 
tive that is missing from Grunt. In the era of 
traumatic brain injury, lost limbs and PTSD, 
the focus of scientists studying humans at war 
has evolved from getting soldiers to survive 
battle to getting them to survive peace. m 


Sharon Weinberger is a fellow at the Radcliffe 
Institute for Advanced Study at Harvard 
University in Cambridge, Massachusetts, 

and author of a forthcoming book on the US 
Defense Advanced Research Projects Agency. 
e-mail: sharonweinberger@gmail.com 
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Track the impact of 
Kenya’s ivory burn 


Kenya's government delivered 

a powerful message against 
elephant poaching and the 

illegal ivory trade on 30 April 

by burning 105 tonnes of ivory, 
worth up to US$220 million. 
With stockpile destruction on the 
rise, it is important to evaluate 
the impact of this strategy on 
elephant populations. 

Since 1989, 21 countries have 
burned or crushed 263 tonnes 
of ivory — most of it (86%) in 
the past 5 years (see go.nature. 
com/ivory). However, there is 
no published evidence so far that 
these events reduce poaching. 

Destroying ivory stockpiles 
risks a perverse outcome: 
ivory becomes rarer, fetching 
higher prices and increasing 
poaching and illegal stockpiling 
(see M. ’t Sas-Rolfes et al. 
Pachyderm 55, 62-77; 2014). 
This has prompted calls by some 
for a highly controlled legal 
ivory trade to secure elephant 
populations (J. F Walker and 
D. Stiles Science 328, 1633-1634; 
2010) — an option that ivory 
destruction removes. 

It is therefore crucial to track 
the effects of Kenya's largest-ever 
ivory burn. Time is short and the 
stakes are high. 

Duan Biggs* University of 
Queensland, Brisbane, Australia. 
ancientantwren@gmail. com 

*On behalf of 4 correspondents (see 
go.nature.com/1rt2mhe for full list). 


China’s primates: EU 
can’t have it all ways 


Weare concerned about the 
prospect of China becoming a 
world leader in research involving 
non-human primates, given the 
country’s comparatively weak 
regulatory system and ethical 
framework (see Nature 532, 281; 
2016 and Nature 532, 300-302; 
2016). 

China’ relative freedom 
from the “ethical pressure” you 
mention makes it attractive to 
researchers working on primates. 


But animal studies that could 

fail the harm-benefit evaluation 
in many Western regulatory 
systems should not be allowed 

— or actively encouraged — to 
take place elsewhere. Far from 
putting researchers under 
negative ethical pressure, the 
project-authorization process in 
the European Union was set up 
with full input from scientists and 
is often held up (by them) as an 
appropriate safeguard to promote 
good quality, ethically conducted 
science and good animal welfare 
(Nature 521, 7; 2015). You cannot 
have it both ways. 

Rather than exploiting weaker 
animal-research regulations, we 
argue that more effort should 
be invested in developing 
and validating alternative 
technologies to avoid or reduce 
the use of non-human primates. 
Penny Hawkins, Paul Littlefair 
Royal Society for the Prevention of 
Cruelty to Animals, Southwater, 
UK. 
penny. hawkins@rspca.org.uk 


China’s primates: 
preserve wild species 


China is being put forward 
as a world leader in primate 
biomedical research (see Nature 
532, 300-302; 2016), even while 
its wild populations of primates 
are being lost at an alarming rate 
because of illegal activities and 
poor conservation practice. 
Take rhesus macaques 
(Macaca mulatta), the species 
most frequently used in 
biomedical research. The wild 
population of these primates in 
China was estimated at about 
200,000 in 2008, with a further 
40,000 kept in breeding centres 
(see go.nature.com/1sl2bqx). 
This number of captive 
animals has since significantly 
increased and may, along with 
the 20,000 exported each year, 
include individuals that were 
bred outside captivity (see X. Hao 
Cell 129, 1033-1036; 2007). 
Despite the changes that China 
is making on paper to improve 
its conservation policies, the 


declining state of its 19 native 
primate species conveys a 
different story. These animals 

are disappearing because of 
habitat disturbance, illegal export 
and hunting — including for 
traditional medicine. 

The country seems to us to be 
more concerned with increasing 
its reputation in biomedical 
primate research. That reputation 
will be boosted by the large input 
of government funding and by 
Western researchers flocking in 
for the reasons you mention. 

China's position on 
conservation issues and on 
primate welfare should not be 
skated over. Animals are not 
exploitable, and wild populations 
should not be an afterthought. 
Alison M. Behie, Colin P. Groves 
The Australian National 
University, Canberra, Australia. 
alison. behie@anu.edu.au 


Supervise Chinese 
environment policy 


China's latest five-year plan 
shifts its environmental law 
away from a pollution-control 
system and towards one that 
manages environmental quality 
(see Nature 531, 524-525; 2016). 
Regional efforts will now be 
subject to greater oversight to 
ensure that improvements are 
implemented across the country 
and to prevent local corruption. 

Under the plan, provincial 
environmental-protection 
departments will be responsible 
for unifying local monitoring 
and inspection programmes and 
for eliminating protectionism in 
local governments (see B. Zhang 
and C. Cao Nature 517, 433- 
434; 2015). China’s Ministry 
of Environmental Protection 
has already established separate 
environmental-management 
departments for water, air 
and soil. 

In my view, strict national 
supervision would help to 
keep these regional reforms on 
track and to make them more 
effective. The US Environmental 
Protection Agency, for example, 


manages pollution through 
permits and defined standards. 
Ten regional offices work with 
individual states to implement 
these regulations. The agency can 
revoke state programmes that fail 
to fulfil their responsibilities. 

In China, the unified 
supervision of local monitoring 
and inspection by provincial 
environmental-protection 
departments, which then report 
to the ministry, is an important 
step towards improving 
environmental quality. However, 
each of the links in this chain 
must strictly enforce the 
regulations and work with the 
rest to clean up the country’s 
environment. 

Bo Zhang Information Center, 
Ministry of Environmental 
Protection, China. 
zhangbo@mep.gov.cn 
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Change of identity is 
not in the air 


Change is indeed in the air for 
many butterflies — at least in 
their ecology, if not in their outer 
appearance (see ‘Change is in 
the air’ Nature 532, 403-404; 
2016). However, the example 
you picture is not the European 
species Hesperia comma, but 
the North American Epargyreus 
clarus — both of which have the 
same vernacular name of 
silver-spotted skipper. 

Josef Settele Helmholtz Centre 
for Environmental Research — 
UFZ, Halle, Germany. 
josef.settele@ufz.de 


CORRECTION 

The Correspondence by 

P. Dobosz and J. Zawita- 
Niedzwiecki (Nature 532, 441; 
2016) incorrectly described 
some Polish universities as 
“engaging” anti-scientific 
speakers. In fact, the speakers 
either hired university premises 
or participated in discussions 
at university conferences. 
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OBITUARY 


Ikka Hanski 


(1953-2016) 


Population ecologist who modelled how species cope with habitat loss. 


cologist Ilkka Hanski’s pioneering 
Hee changed our understanding 

of how biodiversity is maintained. 
Combining mathematical modelling and 
long-term data from the wild, he developed 
metapopulation theory. This predicts the 
degree of habitat fragmentation beyond 
which a species will go extinct. 

Hanski’s 1999 book Metapopulation 
Ecology (Oxford University Press) became 
a cornerstone for researchers in population 
biology, conservation biology and landscape 
ecology. He identified the genetic basis of 
traits that underpin survival in fragmented 
habitats. Most recently, he demonstrated with 
colleagues that an increasing prevalence of 
inflammatory diseases is associated with 
declining biodiversity. 

Hanski, who died on 10 May, was born in 
1953 in Lempaala, Finland. Asa child he col- 
lected butterflies at his grandparents’ house in 
southeastern Finland. Decades later, Hanski 
reflected that many of his most successful 
research projects were inspired by these out- 
door adventures and by the encouragement 
that he received from Esko Suomalainen, a 
geneticist at the University of Helsinki whom 
he had contacted after finding a rare butterfly 
at his grandparents’ house. 

Hanski studied biology at the University 
of Helsinki and gained his doctorate, on 
the community ecology of dung beetles, 
from the University of Oxford in 1979. He 
found that most dung-beetle species clump 
together, with particular species common 
in some pats but scarce or absent in others. 

Before the 1970s, ecologists paid little atten- 
tion to whether populations were distributed 
continuously or in many local patches or both. 
Throughout his career, Hanski was intrigued 
by the ecology and evolution of species found 
in islands — naturally fragmented habitats. 
In 1969, population biologist Richard Levins 
introduced the concept of a metapopulation 
— a ‘population of populations’ — species 
living in networks of habitat patches such as 
cowpats or islands, work that Hanski built on. 

Returning to Helsinki after his doctor- 
ate, Hanski continued to develop models for 
metapopulation survival. By the late 1980s, he 
was ready to test his predictions in the field, 
but which insect to study, and where? He was 
inspired during a fortuitous visit by renowned 
population biologist Paul Ehrlich of Stanford 
University in California. Discussions about 
Ehrlich’s research on Edith’s checkerspot but- 
terfly (Euphydryas editha) prompted Hanski 
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to choose the Glanville fritillary butterfly 
(Melitaea cinxia) in the Aland Islands off 
southern Finland to test his predictions. 

In the early 1990s, he set out to map all 
suitable habitat patches for M. cinxia in 
Aland. His effort has grown into a database 
of more than 4,000 localities. These places 
are checked each year for M. cinxia and its 
larval host plants — as well as the parasitoids 
and pathogens of each. 

This has since become one of the most 
important model systems in population 
biology. A one-of-a-kind long-term data- 
collection effort, it is revealing how species 
and their interactions are responding to cli- 
mate change, as well as shedding light on how 
species cope with habitat fragmentation. 

In 1994, Hanski published the incidence- 
function model, which elegantly formulated 
the relationships between the area and isola- 
tion of a habitat patch and the likelihood of it 
being occupied bya species (I. Hanski J. Anim. 
Ecol. 63, 151-162; 1994). This launched a new 
era of spatially explicit population models and 
was quickly adopted by ecologists. 

Those of us who worked with Hanski will 
remember his sharp intellect and ceaseless 
enthusiasm for understanding nature. He was 
quick to adapt new methods and techniques. 
The Metapopulation Research Centre in Hel- 
sinki that Hanski established in 2000 consists 
of ecologists, evolutionary biologists, math- 
ematicians, bioinformaticians and molecu- 
lar biologists. Hanski also led Finland's first 
sequencing of an animal or plant genome: 
that of his Glanville butterfly, published in 
2014 (V. Ahola et al. Nature Commun. 5, 
4737; 2014). 
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In 2003, nearly 25 years after completing 
his thesis, Hanski returned to his beloved 
dung beetles. He launched a project in 
Madagascar to study the evolutionary biol- 
ogy of the island’s diverse endemic species 
of dung beetle and how these ecologically 
crucial communities respond to habitat loss. 
He led a series of excursions to Madagascar 
to work with local students, his team from 
Finland and his family. These trips became 
legendary, both for their scientific value and 
for the camaraderie he fostered. 

Among numerous honours, Hanski was 
awarded ecology’s top gong, the Crafoord 
Prize in Biosciences, in 2011. And despite 
his hectic schedule, he always prioritized 
public engagement. In Finland, Hanski 
was known for his views on conservation, 
in particular the protection of old-growth 
forests. Hanski was also a powerful advocate 
for basic research, criticizing science policy 
that demanded immediate economic ben- 
efit and arguing that such short-sighted aims 
threaten the fundamental process by which 
knowledge is generated. 

Training young scientists was a top prior- 
ity. Always keen to discuss ideas and offer 
feedback on manuscripts, Hanski became a 
co-author only on papers on which he felt 
he had made a significant intellectual con- 
tribution. This policy promoted the inde- 
pendence of early-career scientists working 
in the Metapopulation Research Centre. He 
loved to debate, and challenged everyone — 
regardless of their career stage — to discuss 
topics ranging from science to society. 

Ilkka lived in Helsinki with his wife Eeva 
Furman, a professor of environmental policy, 
and three children. He was dedicated to his 
family, a quality that resulted in a family- 
friendly working environment in the centre. 
After being diagnosed with cancer in 2014, 
Ilkka, with typical determination, completed 
projects closest to his heart — notably a book, 
Messages from Islands: A Global Biodiversity 
Tour, to be published in December. 

Ilkka had so much more to give and he 
touched so many in the ecology community. 
His death leaves a gap that wont be filled. m 


Anna-Liisa Laine is professor of plant 
ecology at the University of Helsinki, 
Finland. Ilkka Hanski supervised her PhD 
between 2001 and 2005, and she joined the 
Metapopulation Research Centre as a group 
leader in 2010. 
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HOST-MICROBE INTERACTION 


Rules of the game for microbiota 


Are the dynamics of our microbial communities unique to us or does everyone’s microbiota follow the same rules? 
The emerging insights into this question could be of relevance to health and disease. SEE LETTER P.259 


KAROLINE FAUST & JEROEN RAES 


he composition of a body part's microbial 
"[Fesnmani can differ substantially from 

one person to the next’. This is due to 
both host pressures and the dynamic behaviour 
of the microbes themselves. Understanding 
whether these interactions are consistent across 
hosts or whether each individual's microbiota 
follows its own rules has big implications. If the 
dynamics of an organ’s microbial community 
are universal, we can use them to predict effec- 
tive interventions for modulating the micro- 
biota. If, however, microbial dynamics are 
host-specific, interventions must be designed 
separately for each person. Bashan et al.° address 
this issue using a new approach and report their 
intriguing observations on page 259. 

To find out whether community dynamics 
are universal, ideally we should study long and 
densely sampled time series from many indi- 
viduals with different traits and backgrounds. 
Models of microbial communities should then 
be fitted to the varying proportions of micro- 
bial species, which may become challenging 
when going beyond the most dominant groups 
of species. Such large temporal data sets are 
currently gravely lacking. 

Bashan and colleagues devised an indirect 
method to address the question of universal- 
ity. They measured two independent aspects of 
community similarity: overlap, which compares 
species assemblies by quantifying the propor- 
tion of shared species; and dissimilarity, which 
assesses the difference in abundance profiles of 
the shared species between individuals. The dis- 
similarity is then plotted against the overlap for 
all sample pairs to create a dissimilarity—overlap 
curve (DOC). If microbiota dynamics are truly 
universal (host-independent), then having the 
same species present should lead to the same 
relative proportion of those species, because 
they would dynamically influence each other in 
the same way. Consequently, a larger proportion 
of shared species should increase the commu- 
nity similarity and result in the tell-tale negative 
slope of the DOC (Fig. 1). 

The authors tested their method by simu- 
lating microbial communities computation- 
ally using what is known as the generalized 
Lotka-Volterra model’, to generate com- 
munities with the same and with different 
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Figure 1 | Learning from similarities and differences in our microbes. To test whether microbial 
communities within a specific body part have the same underlying dynamics across individuals, Bashan 
et al.° used a method known as the dissimilarity-overlap curve (DOC). a, If microbial community 
dynamics are universal between individuals (A—C), the presence of the same species (species represented 
by coloured nodes; grey nodes represent absent species) should also lead to similar species proportions 
and a negative DOC slope. Consequently, a single model can be used to predict microbiota behaviour. 

b, If the community dynamics are host-specific, the presence of the same species does not lead to similar 
proportions and the DOC is flat. This necessitates the development of personalized models. 


dynamics as positive and negative controls. In 
addition, they showed that randomizing data 
by shuffling microbial species across samples 
also removes the negative slope. These simu- 
lations confirm that the DOC detects univer- 
sal dynamics and flattens in the absence of 
such dynamics. The curve even identifies 
strongly interacting species. 

Most notably, the team detected negative 
slopes for the oral and gut communities in 
several human-microbiome data sets, includ- 
ing those of the Human Microbiome Project’ 
and two human-gut time series*”. However, 
the skin microbiota displayed weakly negative 
or flat DOCs in some cases, suggesting that 
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the microbial dynamics in the skin are host- 
specific at certain sites. Another interesting 
finding was that the DOC for the gut micro- 
biota of people recurrently infected with the 
bacterial pathogen Clostridium difficile’’ is flat, 
but gains a negative slope after faecal transplan- 
tation from people who have not been infected. 

If the assumptions hold, the consistent nega- 
tive slopes observed for the healthy cohorts and 
for people treated after infection with C. difficile 
point to universal gut microbial dynamics. This 
is good news for all modelling efforts aiming 
to predict the behaviour of the gut microbiota 
during interventions or in disease. It means 
that when parameters such as growth rates and 


interactions are determined for the gut micro- 
bial community of one healthy human, they are 
also valid for those of other individuals. Thus, 
the knowledge of such parameters can be com- 
bined across different studies and could, in the 
long term, allow a detailed, common microbial 
community model to be developed. 

The DOC method has all the hallmarks of a 
powerful analytical tool. It is easy to implement, 
addresses a crucial question and may inspire 
applications beyond its intended use. 

But, like all analyses, it makes a couple of 
assumptions — that the microbiota are in a 
steady state, and that having the same steady 
state implies that microbiota are governed by 
the same dynamics. The second assumption is 
the more risky: microbiota may end up in simi- 
lar steady states not because of their intrinsic 
dynamics, but because of a strong environ- 
mental pressure that selects for a particular set 
of species. The authors rule out obvious host 
parameters such as diet, weight, age, race and 
transit time through the gut (measured by stool 
consistency) that may shape gut microbial com- 
munities. However, they do not account for all 
factors that may conceivably influence the gut 
microbiota’, and so cannot provide an entirely 
conclusive answer regarding the universality of 
the gut’s microbial community dynamics. 

The value of this work lies primarily in the 
importance of the question asked, the original- 
ity of the approach and the fact that it could 
spur a whole range of microbiome research. We 
expect it to spark fruitful discussions and lead 
to fresh ideas for analyses and experiments. 
For instance, it might be plausible to set up an 
artificial community under controlled con- 
ditions within a chemostat and then develop 
and define a model that describes its dynam- 
ics reasonably well. One could then compare 
the steady states reached by different subsets 
of the community to directly test the second 
assumption. If universal dynamics are con- 
firmed, modelling efforts have a better chance 
of leading to more-effective clinical interven- 
tions. Bashan and colleagues’ paper gives a 
glimpse of the deeper insights to be gained once 
we overcome the hurdles of controlled, high- 
throughput microbial community cultivation 
and manipulation. = 
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Department of Microbiology and Immunology, 
KU Leuven-University of Leuven, and in the 
VIB, Center for the Biology of Disease, Leuven 
3000, Belgium. K.F. is also in the Microbiology 
Unit, Faculty of Sciences and Bioengineering 
Sciences, Vrije Universiteit Brussel. 

e-mail: jeroen.raes@kuleuven.be 


1. Arumugam, M. et a/. Nature 473, 174-180 
(2011). 

2. Falony, G. et al. Science 352, 560-564 (2016). 

3. The Human Microbiome Project Consortium. 
Nature 486, 207-214 (2012). 

4. Zhernakova, A. et al. Science 352, 565-569 (2016). 

5. Costello, E. K. et al. Science 326, 1694-1697 
(2009). 


6. Bashan, A. et al. Nature 534, 259-262 (2016). 
7. Stein, R. R. et al. PLoS Comput. Biol. 9, e1003388 
8. 


(2013). 
Caporaso, J. G. et al. Genome Biol. 12, R50 (2011). 


GEOCHEMISTRY 


NEWS & VIEWS | RESEARCH | 


9. David, L. A. et al. Genome Biol. 15, R89 
(2014). 

10.Youngster, |. et al. Clin. Infect. Dis. 58, 1515-1522 
(2014). 


Hydrogen and oxygen 
in the deep Earth 


The finding that an unusual iron oxide forms at extremely high pressures 
suggests that hydrogen and oxygen — two elements that strongly influence 
Earth’s evolution — are generated in the mantle. SEE LETTER P.241 


TAKEHIKO YAGI 


ydrogen greatly affects the properties 
H of many materials. It is thought that 
most of the hydrogen in modern Earth 
is in water molecules, many of which are found 
in water-bearing minerals. It is therefore cru- 
cial to understand the stability and circulation 
of such hydrous minerals in Earth’s interior, 
and this need has led to numerous studies of 
hydrous minerals under high-pressure and 
high-temperature conditions. In this issue, 
Hu et al.’ (page 241) cast fresh light on the 
hydrogen-circulation issue. They report that 
an oxygen-rich iron oxide, FeO,, is stabilized 
at pressures greater than about 76 gigapascals, 
and that this material might enable previously 
unknown hydrogen and oxygen cycles to occur 
in Earth’s mantle. 
Earth’s core is mainly made of metallic 
iron, whereas the major minerals in the upper 
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mantle 
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FeOOH FeO, + H, 
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mantle contain mostly ferrous iron (Fe**). The 
most abundant form of iron on Earth's surface 
is haematite (Fe,O,), which contains ferric 
iron (Fe**) and is the main constituent of iron 
ore. Most of this ferric iron is thought to have 
formed by the oxidation of ferrous or metallic 
iron by the modern, oxygen-rich atmosphere. 

On the basis of the distribution of ferric, 
ferrous and metallic iron from the surface to 
the core, it is thought that Earth’s redox state 
becomes increasingly reducing with depth, 
so that the amount of ferric iron in the lower 
mantle would be limited. High-pressure labora- 
tory experiments” revealed that, when olivine, 
(Mg,Fe’"),SiO, (the most abundant mineral in 
the upper mantle) is subjected to conditions 
corresponding to those of the lower mantle, it 
changes into a mixture of two other minerals, 
bridgmanite, (Mg,Fe**) SiO,, and ferropericlase, 
(Mg,Fe**)O. However, aluminium ions are also 
found in the mantle. When these are added, 


Goethite 


FeO, FeO + O, 


Figure 1 | Proposed source of hydrogen and oxygen in the lower mantle. a, Descending slabs of Earth's 
crust can be carried to the transition zone between the upper and lower mantle, where they are heated until 
dense minerals form. The dense material then sinks to the bottom of the lower mantle. Hu et al.’ suggest 
that when the mineral goethite (FEOOH, commonly formed by the reaction of the mineral haematite and 
water on Earth's surface) is carried to the mantle by a slab, an oxygen-rich iron oxide (FeO,) and hydrogen 
would form at depths greater than 1,800 kilometres. The dense FeO, would sink to the bottom of the lower 
mantle, and might help to explain the structural complexity of the D” layer, which lies close to the core- 
mantle boundary. The highly mobile hydrogen would spread upwards. b, If the FeO,-containing material is 
lifted by motion in the lower mantle, it will break down and release oxygen at depths of less than 1,500km. 
(Adapted from a graphic by Jun Tsuchiya.) 
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bridgmanite containing a large amount of 
ferric iron forms, together with ferropericlase 
and some metallic iron*. More than 60% of the 
total iron in bridgmanite can be ferric iron. 

Hu et al. now add to this picture by inves- 
tigating what happens when haematite is 
compressed in oxygen and heated to generate 
the pressure and temperature conditions that 
correspond to the deep lower mantle (78 GPa 
and 1,800 kelvin). The authors used X-ray dif- 
fraction to study the sample, but the diffrac- 
tion patterns obtained were quite ‘spotty’ and 
the sample existed neither as a powder nor asa 
single crystal. In such cases, the crystal struc- 
tures of materials cannot be determined 
in detail. The researchers therefore used a 
method called multigrain crystallography’ to 
analyse the spotty patterns, and concluded that 
the sample is an aggregate of at least 33 single 
crystals in which the haematite has changed 
into FeO,, an iron oxide that has the same 
atomic structure as pyrite (FeS,). 

This finding might suggest that Fe** — 
normally a metastable form of iron — had 
formed under the extreme experimental con- 
ditions, and that its charge is balanced by two 
O* ions. However, Hu et al. found that the 
oxygen-oxygen (O-O) bond in FeO, is only 
1.937 angstréms in length; by comparison, the 
ionic radius of O* is 1.40 A (ref. 6), correspond- 
ing to an O-O bond length of 2.8 A or greater. 
The observed bond length does, however, agree 
with the typical O-O bond length for a perox- 
ide ion (O,” ). If the sample contains peroxide 
ions, then the iron must be ferrous, to balance 
the charge of those ions; in other words, the 
iron has been reduced from Fe* in haematite 
to Fe**. Such a reaction is possible only at very 
high pressures, because the FeO, has a smaller 
volume than a mixture of haematite and oxy- 
gen, and the smaller volume becomes energeti- 
cally favourable under extreme high pressures. 

Hu and colleagues went on to show that the 
mineral goethite, FEOOH, also forms FeO, at 
2,050 K and a pressure of 92 GPa by releasing 
hydrogen. Goethite commonly forms from the 
reaction of haematite and water at Earth's sur- 
face. The authors further demonstrated that 
the FeO, formed in this way becomes unsta- 
ble, and probably breaks down into ferrous 
oxide (FeO) and oxygen when the pressure is 
reduced. 

These findings present new possibilities for 
how hydrogen and oxygen form and circulate 
inside Earth. When goethite (or a mixture 
of haematite and water) is carried deep into 
the lower mantle by subduction processes, 
then hydrogen and FeO, are formed (Fig. 1). 
Hydrogen is extremely mobile and will spread 
upwards, eventually escaping into space, 
whereas heavy FeO, will sink to the bottom of 
the lower mantle. But if the FeO, is lifted to the 
upper part of the lower mantle by, for example, 
an upwelling plume of hot rock, it will become 
unstable and release oxygen gas on the way. 
This means that large amounts of hydrogen 
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and oxygen might occasionally be produced 
in the lower mantle. 

No such possibility has previously been 
considered. As the authors claim, this process 
could have acted as an additional or alterna- 
tive oxygen source for the Great Oxidation 
Events — the periods in Earth’s history when 
the atmosphere became oxygenated. Until 
now, it was thought that the oxygen was sup- 
plied by biological activity alone. 

If hydrogen is released by the subduction 
of goethite, how will it behave at great depths? 
Unfortunately, hydrogen is invisible to X-rays 
and to electron microscopy, which makes its 
behaviour difficult to study at the atomic scale. 
Neutron diffraction is a powerful probe for 
directly observing hydrogen, anda facility’* that 
enables this technique to be used at high pres- 
sures and temperatures has successfully tracked 
the movement of hydrogen within materials’. 
Such techniques could be used to study the 
behaviour of hydrogen in Earth’ interior. 

And what is the fate of FeO, when it sinks 
to the bottom of the lower mantle? Just like 
the finding’ that bridgmanite adopts an 
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unexpected dense phase at pressures greater 
than 120 GPa, Hu and colleagues’ work sug- 
gests explanations for the structural complex- 
ity of the region called the D” layer near the 
core—mantle boundary. Further studies are 
required to address this issue, and to work 
out how hydrogen and oxygen circulate in the 
deep Earth. = 
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and back again 


Deadly coral snakes warn predators through striking red-black banding. New data 
confirm that many harmless snakes have evolved to resemble coral snakes, and 
suggest that the evolution of this Batesian mimicry is not always a one-way street. 


DAVID W. PFENNIG 


any species that are dangerous or 
M unpleasant to eat have evolved con- 
spicuous signals that warn predators 
to avoid them. Not surprisingly, many other 
species that are edible to predators, from birds 
and butterflies to salamanders and sea slugs, 
have evolved to resemble these inedible spe- 
cies’. By doing so, the ‘mimics’ receive protec- 
tion from predation, just like their ‘models. 
This phenomenon is known as Batesian mim- 
icry after the explorer and naturalist Henry 
Walter Bates, who first described it”. Batesian 
mimicry has long fascinated evolutionary biol- 
ogists, and it is widely used to illustrate the 
power of natural selection to produce remark- 
able adaptation’. Yet we still do not know how 
common Batesian mimicry is, what its role is 
in evolutionary diversification, nor whether it 
can be reversed. Writing in Nature Communi- 
cations, Davis Rabosky et al.’ present findings 
on mimicry of coral snakes that go a long way 
towards answering these questions. 
In 1867, the naturalist Alfred Russel Wallace 


© 2016 Macmillan Publishers Limited. All rights reserved. 


suggested that the striking resemblance between 
deadly coral snakes and numerous harmless 
species of red-black-banded (RBB) snakes 
reflected Batesian mimicry* (Fig. 1). However, 
whether coral-snake mimicry actually occurs 
has been questioned ever since, primarily 
because of the (presumed) non-concordance 
in the geographical distributions and abun- 
dances of coral snakes and their mimics. Several 
studies have attempted to address this issue’; 
most notably, it has been demonstrated that 
predators avoid artificial snakes that have RBB 
patterns® but only in geographical regions 
where coral snakes occur’, exactly as predicted 
by the mimicry hypothesis. 

Davis Rabosky and colleagues focus on this 
system, but present a more comprehensive 
study than these earlier investigations. By inte- 
grating colour-pattern, distribution and phylo- 
genetic data across all ‘New World’ species of 
snake, they show that evolutionary shifts to 
RBB patterns in coral snakes and numerous 
non-venomous species are highly correlated 
in space and time. Indeed, they find that every 
origin of the RBB pattern in non-venomous 
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Deadly coral snakes warn predators through striking red-black banding. New data 
confirm that many harmless snakes have evolved to resemble coral snakes, and 
suggest that the evolution of this Batesian mimicry is not always a one-way street. 
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any species that are dangerous or 
M unpleasant to eat have evolved con- 
spicuous signals that warn predators 
to avoid them. Not surprisingly, many other 
species that are edible to predators, from birds 
and butterflies to salamanders and sea slugs, 
have evolved to resemble these inedible spe- 
cies’. By doing so, the ‘mimics’ receive protec- 
tion from predation, just like their ‘models. 
This phenomenon is known as Batesian mim- 
icry after the explorer and naturalist Henry 
Walter Bates, who first described it’. Batesian 
mimicry has long fascinated evolutionary biol- 
ogists, and it is widely used to illustrate the 
power of natural selection to produce remark- 
able adaptation’. Yet we still do not know how 
common Batesian mimicry is, what its role is 
in evolutionary diversification, nor whether it 
can be reversed. Writing in Nature Communi- 
cations, Davis Rabosky et al.’ present findings 
on mimicry of coral snakes that go a long way 
towards answering these questions. 
In 1867, the naturalist Alfred Russel Wallace 


suggested that the striking resemblance between 
deadly coral snakes and numerous harmless 
species of red-black-banded (RBB) snakes 
reflected Batesian mimicry* (Fig. 1). However, 
whether coral-snake mimicry actually occurs 
has been questioned ever since, primarily 
because of the (presumed) non-concordance 
in the geographical distributions and abun- 
dances of coral snakes and their mimics. Several 
studies have attempted to address this issue’; 
most notably, it has been demonstrated that 
predators avoid artificial snakes that have RBB 
patterns® but only in geographical regions 
where coral snakes occur’, exactly as predicted 
by the mimicry hypothesis. 

Davis Rabosky and colleagues focus on this 
system, but present a more comprehensive 
study than these earlier investigations. By inte- 
grating colour-pattern, distribution and phylo- 
genetic data across all ‘New World’ species of 
snake, they show that evolutionary shifts to 
RBB patterns in coral snakes and numerous 
non-venomous species are highly correlated 
in space and time. Indeed, they find that every 
origin of the RBB pattern in non-venomous 
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snakes occurred only after that particular 
lineage and coral snakes were present together 
in the New World. Thus, in every case, the 
warning signal arose first in the model, then 
in the mimic, which is a key prediction of 
Batesian-mimicry theory. These data should 
therefore lay to rest any doubts about whether 
coral-snake mimicry does occur. 

The authors’ work also shows that 
coral-snake diversity strongly predicts (and 
substantially increases) the number of mimic 
species in a given geographical area. Indeed, 
their data suggest that the ‘mimicry excess’ 
problem is even greater than has been his- 
torically assumed, with up to six times more 
mimetic than model species present in a given 
locality, and many more than would be expected 
if RBB snakes were distributed randomly across 
the New World. These data are at odds with the 
long-standing theoretical expectation that 
Batesian mimics should be rarer than their 
toxic models. However, this expectation might 
not apply with a highly toxic model, such as the 
coral snake. When the model is highly toxic, 
the fitness costs of mistakenly attacking it would 
probably be so severe that predators would be 
under strong selection to avoid such a model 
(and any lookalikes), even if the model is rare’. 

Another advance that stems from this work 
is the authors’ proposal that mimicry might 
not represent an evolutionary end point. In 
particular, their data suggest that not only have 
evolutionary transitions between cryptic (non- 
mimetic) patterns and RBB (mimetic) pat- 
terns occurred frequently in non-venomous 
snakes, but so also have transitions between 
mimetic patterns and cryptic patterns. Most of 
these losses of mimicry occurred in the tropics, 
where coral snakes are continuously distributed, 


Figure 1 | Protective imitation. Many species of harmless snake, such as the false coral snake, Erythrolamprus aesculapii (left), have evolved the red—black- 
banded (RBB) colour pattern of highly venomous coral snakes, such as the Brazilian coral snake, Micrurus brasiliensis (right). Davis Rabosky et al.’ show that 
this RBB pattern has evolved in multiple lineages of non-venomous snakes, but only after each lineage and coral snakes were present together in the New World, 
supporting the long-standing Batesian-mimicry hypothesis’. 
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suggesting that these losses occurred even 
among species that live in the same area as coral 
snakes. This is an intriguing conclusion. Gen- 
erally, mimicry has been viewed as a one-way 
street; it is not clear why a species should ever 
lose mimicry once it has evolved it, particularly 
if their model is still around. 

This suggestion will no doubt motivate 
further studies to determine how transi- 
tions between mimicry and cryptic pattern- 
ing occur. Evolutionary biologists have long 
debated whether Batesian mimicry could 
evolve through a gradual process of incremen- 
tal evolution’, and many of these arguments 
should apply equally to its loss. In particular, 
it is unclear how a population could transi- 
tion from an ancestral cryptic colour pattern 
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to a derived mimetic one (or vice versa) if the 
population must pass through a phase in 
which it expresses a colour pattern that is inter- 
mediate between these two extremes. Such an 
intermediate colour pattern would be expected 
to be disfavoured, because it should fail to 
receive the fitness benefits of either cryptic 
patterning or mimicry. 

Batesian mimicry has been called’ “the 
greatest post-Darwinian application of Natu- 
ral Selection”. Davis Rabosky and colleagues’ 
study has settled some questions regarding the 
specific example of coral-snake mimicry, and it 
opens the door to answering several others. m 
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Figure 1 | Protective imitation. Many species of harmless snake, such as the false coral snake, Erythrolamprus aesculapii (left), have evolved the red-black- 
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banded (RBB) colour pattern of highly venomous coral snakes, such as the Brazilian coral snake, Micrurus brasiliensis (right). Davis Rabosky et al.’ show that 
this RBB pattern has evolved in multiple lineages of non-venomous snakes, but only after each lineage and coral snakes were present together in the New World, 
supporting the long-standing Batesian-mimicry hypothesis’. 


snakes occurred only after that particular 
lineage and coral snakes were present together 
in the New World. Thus, in every case, the 
warning signal arose first in the model, then 
in the mimic, which is a key prediction of 
Batesian-mimicry theory. These data should 
therefore lay to rest any doubts about whether 
coral-snake mimicry does occur. 

The authors’ work also shows that 
coral-snake diversity strongly predicts (and 
substantially increases) the number of mimic 
species in a given geographical area. Indeed, 
their data suggest that the ‘mimicry excess’ 
problem is even greater than has been his- 
torically assumed, with up to six times more 
mimetic than model species present in a given 
locality, and many more than would be expected 
if RBB snakes were distributed randomly across 
the New World. These data are at odds with the 
long-standing theoretical expectation that 
Batesian mimics should be rarer than their 
toxic models. However, this expectation might 
not apply with a highly toxic model, such as the 
coral snake. When the model is highly toxic, 
the fitness costs of mistakenly attacking it 
would probably be so severe that predators 
would be under strong selection to avoid such 
a model (and any lookalikes), even if the model 
is rare’. 

Another advance that stems from this work 
is the authors’ proposal that mimicry might 
not represent an evolutionary end point. In 
particular, their data suggest that not only have 
evolutionary transitions between cryptic (non- 
mimetic) patterns and RBB (mimetic) patterns 
occurred frequently in non-venomous snakes, 
but so also have transitions between mimetic 
patterns and cryptic patterns. Most of these 
losses of mimicry occurred in the tropics, where 
coral snakes are continuously distributed, sug- 
gesting that these losses occurred even among 
species that live in the same area as coral snakes. 
This is an intriguing conclusion. Generally, 
mimicry has been viewed as a one-way street; it 
is not clear why a species should ever lose mim- 
icry once it has evolved it, particularly if their 
model is still around. 


This suggestion will no doubt motivate 
further studies to determine how transi- 
tions between mimicry and cryptic pattern- 
ing occur. Evolutionary biologists have long 
debated whether Batesian mimicry could 
evolve through a gradual process of incremen- 
tal evolution’, and many of these arguments 
should apply equally to its loss. In particular, 
it is unclear how a population could transi- 
tion from an ancestral cryptic colour pattern 
to a derived mimetic one (or vice versa) if the 
population must pass through a phase in 
which it expresses a colour pattern that is inter- 
mediate between these two extremes. Such an 
intermediate colour pattern would be expected 
to be disfavoured, because it should fail to 
receive the fitness benefits of either cryptic 
patterning or mimicry. 

Batesian mimicry has been called’ “the 
greatest post-Darwinian application of Natu- 
ral Selection”. Davis Rabosky and colleagues’ 
study has settled some questions regarding the 


specific example of coral-snake mimicry, and it 
opens the door to answering several others. m 
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Microbial signals to the 
brain control weight 


The bacteria that inhabit the rodent gut promote insulin secretion and food 
intake by activating the parasympathetic nervous system — a hitherto unknown 
mode of action for this multifaceted microbiota. SEE ARTICLE P.213 


MIRKO TRAJKOVSKI & CLAES B. WOLLHEIM 


e live in symbiosis with trillions of 
bacteria that populate our intes- 
tines, known collectively as the gut 


microbiota. These microbes influence many 
physiological processes in our bodies, from 
gut and immune maintenance to neurological 
regulation’. On page 213 of this issue, Perry 
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et al.” highlight a previously unknown role 
for the gut microbiota in stimulating insulin 
secretion by signalling to the brain. Moreover, 
the authors report that these microbes influ- 
ence appetite, providing a hint as to how the 
microbiota might provoke obesity. 

Mammals have evolved several responses 
to energy scarcity. As a result of these adap- 
tations, obesity can arise in conditions of 
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Figure 1 | A mechanism for microbiota-mediated weight gain. Perry 
et al.” report that, in rodents, production of acetate molecules from 
dietary nutrients by the bacteria that colonize the gut (the microbiota) 
increases the brain’s stimulation of the parasympathetic nervous system, 
which includes the vagus nerve. Signals from the vagus nerve trigger 


constant food abundance. This response is 
mediated by the hormone insulin, which is 
secreted from pancreatic B-cells in response to 
increased blood-glucose levels. Insulin tightly 
controls energy balance by enhancing cellu- 
lar lipid synthesis and glucose uptake, causing 
calorie storage. 

Investigating the effects of a high-fat diet, 
Perry and colleagues found that production 
and turnover of the short-chain fatty acid 
(SCFA) acetate was markedly increased in rats 
on a high-fat diet compared with animals fed 
a normal diet. Moreover, infusing the stom- 
achs of rats on anormal diet with acetate for 
ten days increased glucose-stimulated insulin 
secretion (GSIS). 

Although glucose is the main stimulus for 
insulin secretion, the process is also under 
the control of the parasympathetic nervous 
system* — the part of the central nervous 
system that stimulates ‘rest-and-digest’ and 
‘feed-and-breed’ processes. Parasympathetic 
activity is largely mediated by the vagus nerve, 
which sends motor inputs to many organs and 
is responsible for slowing heart rate, and for 
regulating gastrointestinal movement and the 
digestion of food, in addition to enhancing 
insulin secretion’. Perry et al. demonstrated 
that the ability of acetate infusion to increase 
GSIS could be blocked by administering the 
parasympathetic blocker molecules atropine 
or methylatropine, or by surgically severing 
one or more of the branches of the vagus nerve 
that connects to the gut. These results indicate 
that an acetate-induced increase in GSIS is 
controlled by the parasympathetic nervous 
system. 

Further supporting the role of the 
parasympathetic nervous system in acetate- 
mediated GSIS, the authors demonstrated 
that acetate could not stimulate insulin secre- 
tion from isolated B-cell-containing pancreatic 
islets in vitro. This is consistent with some, but 
not all, previous investigations into a direct 
effect of acetate on B-cells (for a review, see 
ref. 5). Acetate administration into either the 
brain’s ventricular system or a vertical column 
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of grey matter embedded in the brainstem — 
both of which feed into the parasympathetic 
nervous system — increased GSIS, again high- 
lighting the central-nervous effects of acetate. 

Next, Perry et al. investigated the effects 
of increased acetate turnover on appetite. A 
chronic increase in acetate turnover promoted 
aconstant drive to eat, known as hyperphagia, 
probably mediated by the ‘hunger hormone’ 
ghrelin — levels of which were elevated 
in the hyperphagic rats compared with con- 
trols. The hyperphagic rats developed obesity, 
probably owing to a combination of increased 
secretion of ghrelin and insulin. 

Because SCFAs are products of bacterial 
fermentation, Perry and co-workers investi- 
gated the role of the gut microbiota in acetate 
turnover. The gut microbiota co-develops with 
the host and modulates whole-body metabo- 
lism by affecting energy balance’. The authors 
transplanted faecal matter from donor rats on 
a normal or high-fat diet into recipients on 
the opposing diet, and found that the acetate- 
turnover rate, faecal acetate levels and GSIS 
levels from the donor group were transferred to 
the recipients, implying that it is changes in the 
microbiota that regulate these factors. Further- 
more, conditions of microbiota depletion (seen 
in germ-free mice, which lack a microbiota, or 
in rats treated with antibiotics) completely sup- 
pressed acetate turnover and decreased ghrelin 
levels compared to control mice — changes that 
were associated with two- and fivefold lower 
skeletal-muscle fat content, respectively. 

These data suggest a mechanistic link 
between the onset of obesity and the gut micro- 
biota. The microbiota-mediated increase in 
acetate turnover that occurs during exposure 
to a high-calorie diet might mediate a feed- 
back loop between the gut microbiota and 
parasympathetic nervous system, promot- 
ing hyperphagia owing to increased ghrelin 
secretion, and increased energy storage as fat 
owing to increased GSIS (Fig. 1). However, this 
mechanism does not explain the observation” 
that microbiota-depleted mice do not show 
suppressed food intake. It is also intriguing that 
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secretion of the ‘hunger hormone’ ghrelin from the stomach, leading 
to increased food intake. The vagus nerve also potentiates glucose- 
stimulated insulin secretion from B-cells in the pancreas, promoting 
calorie storage and fat gain. In this way, the gut microbiota influences 


supplementation of the diets of rats with two 
other SCFAs, butyrate and propionate, improves 
host physiology and glucose metabolism, which 
in the case of propionate seems to be mediated 
by vagus-nerve stimulation by the peripheral 
nervous system''. This might indicate that the 
site of stimulation — central or peripheral — is 
relevant for SCFA-mediated effects in the para- 
sympathetic nervous system, and points to the 
need for further exploration of the general role 
of SCFAs in regulating obesity. 

For instance, follow-up work could address 
whether the effects in the brain are mediated 
by the SCFA receptor proteins FFA2 and FFA3, 
and clarify the controversy” regarding the 
direct effects of acetate on the B-cells. In addi- 
tion, transplantation of the microbiota from 
rodents on a high-fat diet or from humans who 
are obese to germ-free rodents fed a normal 
diet could allow researchers to further test 
for a causal link between specific obesity- 
associated changes brought on by microbiotic 
acetate production and the development of 
metabolic syndrome (which involves obesity, 
insulin resistance, abnormal lipid levels in the 
blood and glucose intolerance). Analysis of 
how the genomes of the microbiota collectively 
change in rodents on a high-fat diet would 
allow researchers to identify acetate-producing 
microbes and to investigate their importance 
in the progression of diet-induced obesity. 

Clinical trials’ have shown that vagus-nerve 
blockade by electrodes can help to reduce body 
weight and improve blood-glucose control in 
people with obesity. Moreover, specific anti- 
microbials and phage therapies’, as well as 
faecal or bacterial transfers, have attracted 
renewed interest in the past few years as poten- 
tial tools to treat antibiotic-resistant enteritis 
(inflammation of the intestine) and ulcerative 
colitis'’ (long-term inflammation of the colon 
and rectum). In the context of the increased 
global prevalence of obesity, Perry and col- 
leagues’ study might inform the development 
of such strategies for suppressing acetate or 
acetate-producing microbes as a means to treat 
obesity and diabetes. = 
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No turning back for 
motorized molecules 


Two molecular motors have been developed that use chemical energy to drive 
rotational motion in a single direction. The findings bring the prospect of devices 
powered by such motors a tantalizing step closer. SEE LETTER P.235 


JONATHAN CLAYDEN 


he conversion of chemical energy to 

mechanical motion drives movement 

in all living things, from bacteria to 
whales. An intricate array of molecular ratchets 
and motors allows cells to extract mechanical 
work from chemical reactions, for example to 
drive muscle contraction, or to twist the heli- 
cal appendages that propel some bacteria. Two 
papers, one by Collins et al.' in Nature Chemistry 
and another by Wilson et al.’ on page 235, report 
the design and construction of artificial molecu- 
lar motors that achieve the same outcome using 
much simpler, purely synthetic structures. Both 
pieces of work show that chemical reagents can 
drive the unidirectional motion of one part of 
a molecule (the rotor) relative to another (the 
stator), and thus provide direct functional 
analogues of biological motors. 

It is not easy to design a synthetic molecular 
motor®. As was pointed out nearly 20 years 
ago’*, molecular motors are characterized by 
movement that must be more than random 
Brownian motion. Furthermore, angular 
momentum cannot be used to maintain a con- 
stant directionality on the molecular scale as it 
can in everyday electric motors. The thermo- 
dynamic landscape ofa molecular system must 
be repeatedly altered to force concerted move- 
ment in a single direction, to prevent mere 
shuttling forwards and backwards between 
two states. The greatest successes in the field so 
far have used light energy to drive a molecular 
system away from equilibrium, followed by a 
directionally defined relaxation process; motors 
capable of megahertz rotational speeds have 
been designed and built using this approach*. 

The two new motors both use chemical 
energy to drive rotation. Collins and col- 
leagues’ motor is remarkably simple in 


conception. The rotor and stator are each a 
benzene ring, connected by a single bond that 
forms a rotatable axle. Systems of this sort can 
rotate freely about the axle, but rotation in the 
authors’ motor is partly restricted by groups or 
atoms attached next to the bond that connects 
the two benzene rings. 

Collins et al. added alternating sets of 
reagents to a solution containing their motor, 
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which allowed first one side of the rotor ring and 
then the other to slip past a sulfur-containing 
group (a sulfoxide; Fig. 1) bonded to the stator. 
The alternating reagents insert a palladium 
atom into a carbon-hydrogen (C-H) bond on 
one side of the rotor, and then into a carbon- 
bromine (C-Br) bond on the other. Palladium’s 
affinity for the sulfur atom of the sulfoxide 
(SOR) group lets it form a bridge between the 
rings that lowers the energy barrier to rotation, 
allowing the rings to slip past one another. On 
its own, shuttling the metal between the C-H 
and C-Br bonds would simply cause random 
rotation clockwise or anticlockwise, but the 
chirality (handedness) of the sulfoxide group 
imparts directionality to the slippage mecha- 
nism, and so also to the rotation of the motor. 
The alternating C-H and C-Br insertions 
needed to drive this process require palladium 
to be in the +2 and 0 oxidation states, respec- 
tively. This means that, in its current form, the 
motor cannot work autonomously, because 
different reaction conditions are needed to 


f 


Pd! 


Figure 1 | A unidirectional molecular motor incorporating a rotating axle. Collins et al.' report a system 
consisting of two benzene rings (green hexagons) connected by a single bond. One ring acts as a rotor, 

and has a hydrogen atom on one side and a bromine atom on the other. The other ring is a stator and has 

a sulfoxide group on one side anda fluorine atom on the other (fluorine atom not shown because it is not 
involved in the motor mechanism). The connecting bond acts as an axle. The rings are also viewed here 
from above, along the axis of the axle (top right in each panel). a, The system's rotation cycle begins with the 
rotor and stator perpendicular to each other. b, Addition of a palladium(11) reagent allows the side of the 
rotor carrying the hydrogen atom to pass the sulfoxide. A palladium atom bridges the two rings. c, The rings 
then relax to the alternative perpendicular arrangement. d, Conversion of palladium(1) to palladium(0) 
allows the side of the rotor carrying the bromine atom to pass the sulfoxide group, and a palladium atom 
again bridges the rings. The cycle continues if reagents are added to toggle the palladium between the two 
oxidation states. Br, bromine; SOR, sulfoxide (where R is a benzene-ring-containing group); Pd, palladium. 
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Station 


‘Stop’ signal 


Figure 2 | A unidirectional molecular motor incorporating a ring travelling round a track. a, Wilson 
et al.” report a system in which a small molecular ring is threaded onto a larger one (the track). The track 
has two ‘stations’ at which the ring can dock, and two molecular groups that act as signals and can be set 
to ‘stop’ or ‘go. When the ring is docked at a station, its proximity to the nearby signal forces that signal 

to stay in the ‘go’ position, allowing the ring to travel to the second station. b, When the ring docks at the 
second station, the first signal changes to ‘stop, preventing reversal of the direction of travel, while the 
second signal switches to ‘go, allowing the ring to carry on around the track to the first station. The signals 
are switched from ‘go to ‘stop’ using the reagent Fmoc-Cl. 


shuttle the palladium between these states. 
However, metal redox processes can be driven 
electrochemically, raising the intriguing pos- 
sibility that future versions of the motor could 
be electrically powered. 

Wilson and colleagues’ chemically powered 
motor overcomes the autonomy problem by 
using a different, more complex design than 
that of Collins and co-workers. In Wilson and 
colleagues’ motor, a small ring is threaded 
onto a larger one (the track), and travels like 
a train around the track by constantly advanc- 
ing in the same direction from one of two 
‘stations’ to the other. Crucially, and in contrast 
to Collins’ and co-workers’ motor, only one set 
of reaction conditions is needed to drive the 
motor forward: the authors use a reactive ‘fuel 
known as Fmoc-Cl, which continuously breaks 
down to carbon dioxide and other by-products 
as the motor runs. 

The small ring’s progress is powered by a 
mechanism that channels random kinetic 
motion into movement in a single direction 
(Fig. 2). Immediately after each station, an 
unstable carbonate group (which becomes 
attached to the track by reaction with Fmoc-Cl) 
provides a ‘stop’ signal. If the carbonate group 
is removed, the signal switches to ‘go. The 
authors designed the chemistry of the system 
such that the signal switches from ‘stop’ to ‘go’ 
at amore or less constant rate at both stations, 
but changes from ‘go to ‘stop’ more rapidly after 
the small ring has passed through to the other 
station. The stop signal therefore tends to fol- 
low the train around the track, ensuring that 
forward motion is always faster than reverse. 

The choice of Fmoc-Cl as the fuel is ingen- 
ious, because the chemical mechanisms that 
involve Fmoc-Cl in switching from ‘stop’ to ‘go’ 
and vice versa are different, which means that 
the rates of the switching steps can be inde- 
pendently controlled. The energy that powers 
the constant forward movement of the train is 
provided by the consumption of Fmoc-Cl, so 
the train keeps moving until all of the Fmoc-Cl 
has been consumed. 

Wilson and colleagues’ work constitutes 
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an important step in the construction of a 
chemically propelled, autonomous molecular 
device, but there is still a long way to go. The 
small ring typically takes 12 hours to travel 
around the track, and the Fmoc-Cl fuel is used 
rather inefficiently — the chemical conditions 
required for the fuel to power the motor also 
cause the fuel to decompose wastefully. Both 
motors currently work in solution, with at 
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least 10'* molecules working in tandem. But 
the translation of chemical energy into macro- 
scopic motion is likely to require motor com- 
ponents to be constructed in the solid phase, 
and to be individually controllable. 

The story of artificial molecular motors 
is still in its opening pages, but chemists’ 
attempts to mimic cellular motors reveal 
how many challenges biological systems have 
overcome to evolve the machinery that pow- 
ers movement. The design principles that work 
are becoming clearer, however, and although 
the possibility of molecular motors routinely 
powering artificial devices in the future is still 
distant, it is now distinct. m 
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Homo floresiensis 


New fossil findings demonstrate that the diminutive hominin Homo floresiensis 
lived on the Indonesian island of Flores at least 700,000 years ago, and may point 
to its rapid dwarfism from the larger Homo erectus. SEE LETTERS P.245 & P.249 


AIDA GOMEZ-ROBLES 


ince the first description of Homo 
floresiensis in 2004 (ref. 1), these little 
hominins from the Indonesian island of 
Flores have raised very big questions. Do these 
skeletal remains represent a new species in the 
extinct hominin family, or are they modern 
humans who were pathologically dwarfed, or 
members ofa short-stature population? If they 
belong to a different species, what was its evo- 
lutionary origin? Why was it so different from 
other hominin species? The most common 
answer to these questions has been repeated 
for more than ten years: we need more remains 
from Flores — especially from different sites 
and older time periods — to tip the scales. On 
pages 245 and 249 of this issue, van den Bergh 
et al.” and Brumm et al.’ report the finding of 
those long-awaited remains. 
After H. floresiensis was described, many 
palaeoanthropologists embraced the idea of a 
newand odd-looking hominin species that had 
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a diminutive brain and body size. Supporters 
of the pathology hypothesis, however, have 
been unrelenting in looking for syndromes and 
conditions that could have been responsible for 
the unexpectedly small size of these hominins; 
some suggestions have been published quite 
recently’. The current findings — consisting 
of a lower-jaw fragment, an indeterminate 
cranial fragment and some small teeth from 
at least three different individuals — confirm 
beyond any reasonable doubt that H. floresien- 
sis is a distinct hominin species with deep 
evolutionary roots that trace back more than 
700,000 years. 

Van den Bergh and colleagues present 
detailed analyses of the size and shape of the 
fossils, found at the Mata Menge site on Flores, 
including comparisons with remains of other 
hominin species. They show that such tiny 
teeth are found only in Homo sapiens — whose 
origin and migration to Asia are substantially 
later than the age of the new fossils — and in 
H. floresiensis. Brumm and colleagues report 


on the open-grassland habitat and stone 
tools associated with these hominins. They 
describe these tools as technologically similar 
to the ones found with the later H. floresiensis 
individuals from the Liang Bua site, and sug- 
gest that this points to the behavioural stabil- 
ity of the hominins from Flores over a long 
period of time. In addition, Brumm et al. use a 
combination of dating techniques to provide 
evidence that the fossils were deposited around 
700,000 years ago, thus confirming the early 
origin of this species. 

Although this confirmation finally ends the 
debate about the validity of H. floresiensis as 
a species, its evolutionary origins are likely 
to remain under discussion for much longer. 
There are two main models (Fig. 1). H. flores- 
iensis may have evolved from the larger Homo 
erectus through a process of island dwarfing — 
an extreme reduction in size due to the absence 
of predators and to resource scarcity that is 
typical of island ecosystems. Alternatively, 
it may be descended from the earlier Homo 
habilis, or even from a small form of hominin 
from the Australopithecus genus. This second 
model implies that very primitive hominins 
would have left Africa by 2 million years ago, 
but there is no fossil or archaeological evidence 
for such an early dispersal. 

Mostly on the basis of the morphology of 
a lower molar tooth and of general affinities 
of the jaw fragment, van den Bergh and col- 
leagues claim that the remains from Mata 
Menge are more closely related to H. erectus 
than to H. habilis. The reliability of lower- 
molar morphology to assess species relation- 
ships supports their claim®. However, the traits 
that point to a more primitive ancestor for 
H. floresiensis mostly come from body parts 
other than the skull®’ and cannot be assessed 
using the Mata Menge sample, which does not 
include such postcranial remains. 

Without further fossil evidence, the discus- 
sion between proponents of the two models 
will continue. Some will think that extreme 
dwarfing from H. erectus is unlikely, espe- 
cially to the extent of the dramatic brain-size 
reduction observed in H. floresiensis*, although 
empirical data from hippopotamuses suggest 
that similarly strong brain reduction may 
occur’. Others will argue that a long-distance 
migration route for H. habilis, or an earlier 
form, from Africa to southeast Asia is even 
more implausible. For now, it seems that all 
possible explanations remain outside the 
comfort zone of classic scenarios of human 
evolution. 

Van den Bergh et al. propose that the homi- 
nins from Mata Menge might be descended 
from the hominins that made stone tools at 
the site of Wolo Sege”’, also on Flores, which 
is dated to approximately 1 million years ago. 
They further hypothesize that large-bodied 
H. erectus hominins are the ones that made 
these tools. This speculation could be proved 
wrong if remains from other small hominins 


Homo habilis 
" Height: 118 cm 
Weight: 33 kg 

Brain: 614 cm? 


Homo erectus 
Height: 165 cm 
Weight: 51 kg 

Brain: 860 cm? 
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Homo floresiensis 
Height: 106 cm 
Weight: 28 kg 
Brain: 426 cm? 


Figure 1 | Candidates for the ancestry of Homo floresiensis. There are two main models for the 
evolutionary origin of the hominin species H. floresiensis, which inhabited the Indonesian island of Flores 
and had a particularly small brain and body. One possibility is that Homo habilis, or a similar form that 
also had a relatively small body and brain, may have left Africa by 2 million years ago and reduced in size 
even further. But there is no evidence for such early hominins outside Africa. Alternatively, H. floresiensis 
may be descended from the later and larger-bodied Homo erectus, for which there is evidence on Java 
around 1 million years ago and earlier. This second model would involve much greater body and brain 
reduction over a much shorter period of time. (Data on brain and body size are from refs 12-14, and 

are based on east African specimens for H. habilis, on early Indonesian specimens for H. erectus and on 


remains from Liang Bua for H. floresiensis.) 


were found with the tools in the future, but it 
raises an interesting question: can the extreme 
reduction of the brain and body of H. floresien- 
sis have evolved over a mere 300,000 years? 

Three hundred millennia may not seem a 
‘short’ period of time to many readers. How- 
ever, no other such dramatic transformation in 
hominin evolution is known to have occurred 
over a similarly brief timescale. A quantita- 
tive analysis and comparison of evolutionary 
rates across different hominin species and with 
H. floresiensis would lend formal support to 
this informal observation. Alongside such 
quantification, it might be helpful to look at 
more-distant species. Some mammals show 
evidence of even stronger degrees of dwarfing 
over substantially shorter periods of time, and 
extremely fast rates of size reduction in island 
environments’ ". In addition, we must not rule 
out the possibility that the direct ancestors of 
H. floresiensis were not the most typical repre- 
sentatives of their species. Indeed, the strange 
combination of primitive and derived traits in 
H. floresiensis anatomy could be the result of 
a pronounced founder effect, which occurs 
when a new population is established from a 
small sample that does not reflect the paren- 
tal population's diversity and most-common 
traits. 

Some scenarios that look mind-blowing 
from our anthropocentric point of view 
become underwhelmingly conventional 
when we expand our horizons. Rapid island 
dwarfism is not extraordinary in nature, 
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nor is the founder effect or the long-scale 
migration of species that lack human-like 
cognitive abilities. Whatever the actual origin of 
H. floresiensis, we will be much closer to an 
answer if we look beyond hominins in our 
search for explanations. m 


Aida Gomez-Robles is in the Center for the 
Advanced Study of Human Paleobiology, 
Department of Anthropology, The George 
Washington University, Washington DC 
20052, USA. 

e-mail: agomezrobles@gwu.edu 


1. Brown, P. etal. Nature 431, 1055-1061 (2004). 

2. van den Bergh, G. D. et al. Nature 534, 245-248 
(2016). 

3. Brumm, A. et al. Nature 534, 249-253 (2016). 

4. Henneberg, M., Eckhardt, R. B., Chavanaves, S. & 
Hsu, K. J. Proc. Natl Acad. Sci. USA 111, 11967-11972 
(2014). 

5. Gémez-Robles, A., Bermudez de Castro, J. M., 
Martin6én-Torres, M., Prado-Simon, L. & Arsuaga, J. L. 
J. Hum. Evol. 82, 34-50 (2015). 

. Jungers, W. L. et al. Nature 459, 81-84 (2009). 

. Tocheri, M. W. et al. Science 317, 1743-1745 
(2007). 

8. Martin, R. D., MacLarnon, A. M., Phillips, J.L. & 

Dobyns, W. B. Anat. Rec. 288A, 1123-1145 (2006). 

9. Weston, E. M. & Lister, A. M. Nature 459, 85-88 
(2009). 

10.Brumm, A. et al. Nature 464, 748-752 (2010). 

11.Evans, A. R. et al. Proc. Nat! Acad. Sci. USA 109, 
4187-4190 (2012). 

12.Kubo, D., Kono, R. T. & Kaifu, Y. Proc. R. Soc. B 280, 
20130338 (2013). 

13.Grabowski, M., Hatala, K. G., Jungers, W. L. & 
Richmond, B. G. J. Hum. Evol. 85, 75-93 (2015). 

14.Baab, K. L. J. Anthropol. Sci. 94; see go.nature.com/ 
jdyyt9 (2016). 


NO 


9 JUNE 2016 | VOL 534 | NATURE | 189 


PERSPECTIVE 


doi:10.1038/nature18285 


Accounting for reciprocal host- 
microbiome interactions in experimental 


science 


Thaddeus S. Stappenbeck! & Herbert W. Virgin! 


Mamunals are defined by their metagenome, a combination of host and microbiome genes. This knowledge presents 
opportunities to further basic biology with translation to human diseases. However, the now-documented influence of 
the metagenome on experimental results and the reproducibility of in vivo mammalian models present new challenges. 
Here we provide the scientific basis for calling on all investigators, editors and funding agencies to embrace changes 
that will enhance reproducible and interpretable experiments by accounting for metagenomic effects. Implementation 
of new reporting and experimental design principles will improve experimental work, speed discovery and translation, 
and properly use substantial investments in biomedical research. 


ecent studies highlight that differences in human genetics explain 

only a fraction of observed human phenotypic variation!”, 

Another well-recognized source of variation is the microbiome, 
which is the collection of all host-associated microorganisms. It includes 
not only the bacterial microbiome but also the virome, archaea, the myc- 
obiome (fungi) and meiofauna (for example, protists and helminths)*°. 
While the majority of the microbiome is located in the gastrointestinal 
tract, the skin, oral, and genitourinary microbiome are also colonized by 
complex sets of organisms”. 

In some cases phenotypes are dominantly controlled by genes in either 
the microbiome or the host (Fig. 1). However, we increasingly recognize 
that the effects of host and microbial genes on a phenotype are in many 
cases interdependent. Such examples include effects of specific microbes 
(that is, a virus or bacteria) that only occur in defined host genetic back- 
grounds'”-!>, We and others have referred to this concept as ‘host gene plus 
microbe; but a more concise descriptor is that these effects are determined 
by the metagenome. We herein define the metagenome as the sum of all 
host genes plus all organism genes of the microbiome. The combinatorial 
effects of host and microbial genes are herein termed metagenomic effects. 
Thus, as a driver of phenotypes, one must consider the metagenome, in 
addition to dominant effects of either host or microbiome genes (Fig. 1). 

We now appreciate that nearly all aspects of human physiology, as well 
as model organisms such as mice, are influenced by the microbiome and 
metagenome. Though the implications for this work are exciting, it has 
generated a series of unintended challenges in mammalian experimen- 
tal biology, especially as related to experimental reproducibility (Box 1). 
A core issue is a lack of accounting for the microbiome and metagenome, 
which in turn leads to inconsistent experimental design and interpreta- 
tion. An underappreciated aspect of the microbiome and metagenome 
effects is that they are broadly relevant and impact many diverse fields. 
Below, we address key scientific aspects of this rapidly evolving and excit- 
ing area, the relevance and impact of the microbiome and metagenome 
on physiology and disease as well as specific environmental variables that 
influence mammalian physiology. We then highlight critical challenges 
that are created by the influence of the microbiome and metagenome and 
propose solutions that require the commitment of many diverse stake- 
holders, including scientists in all areas of investigation, educational insti- 
tutions, funders of research and journals. 


Metagenome effects central to experimental biology 

The recognition that the microbiome and metagenome is a critical factor 
in human health is not a new concept. Elie Metchnikoff, over 100 years 
ago, recognized the importance of the microbiome and performed exten- 
sive experiments in this area’. His thoughts and work foreshadowed the 
need for new tools to disentangle the complex intestinal microbiome as 
well as the profound influences of the gut microbiome on systemic organs. 
The latter concept has recently been demonstrated by the linkage of 
specific intestinal bacterial metabolites to atherosclerosis’’. 

Important associations between the microbiome and disease have been 
made. In many cases, associations in mouse models of disease mirror 
effects observed in humans. These studies support the idea that the micro- 
biome and metagenome have profound local as well as systemic effects, 
both positive and negative in disease and on physiologic processes related 
to disease. Examples include inflammatory bowel disease!*-!, AIDS”, 
arthritis”®, nutrition?”-*’, graft versus host disease*’, obesity and metab- 
olism?’, diabetes*!, haematopoiesis*, brain function®*~*°, cancer’, bone 
mass** and treatment with immunomodulatory drugs*’. Thus, altera- 
tions of the overall taxonomic complexity, as well as the representation 
of specific taxa, are associated with a wide range of disease states. Notably 
these studies show that the microbiome influences systemic organs as 
well as mucosal organs; no area of physiology is likely to be independent 
of these influences. 

These correlative studies do not demonstrate cause and effect, lead- 
ing to the development of experimental systems, many in mice, to test 
mechanistic hypotheses. Improved tools for metagenome analysis, new 
microbial culture methods and standard studies of host-pathogen and 
host-commensal interactions have led to the development of novel ideas 
explaining disease pathogenesis. This has allowed investigators to test for 
the role of specific microbes or microbial products. For example, microbes 
and microbial metabolites (i) affect T cell and macrophage differentiation 
to impact disease states*°-*; (ii) trigger inflammatory phenotypes in the 
intestine of genetically susceptible mice“! (iii) alter vascular and crypt 
structure and development in the intestine*~”; (iv) alter anti-viral T cell 
responses"; (v) reactivate herpesvirus from latency’; and (vi) affect car- 
diovascular disease*°, asthma*!*?, and infectious disease pathogenesis. 
Further, sequential infection of barrier-raised mice with microbes related 
to common human infections (herpesviruses, influenza, intestinal 
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microbial genetics 
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Microbial genetics 


Figure 1 | The role of the metagenome on determining phenotypes. 
The metagenome is defined as the total host genome (depicted by purple 
chromosomes) and the associated microbiome genome (depicted by 

the green chromosomes). It is important to realize that phenotypes 

can be dominantly driven by either host or microbial genes but also by 
combinations of genes within the metagenome’**. 


helminth) alter gene expression patterns in the blood to partially resem- 
ble those observed in adult humans and can alter vaccine responses™. 
Together these studies portend the development of even more extensive 
data linking specific components of the microbiome to important physi- 
ologic mechanisms also involving host genes and improve mouse models 


as tools for discovery of disease mechanisms and therapeutics. 


Microbiome influence on mouse phenotypes 

Mice are amongst the most commonly used model organisms. One com- 
mon concern is the lack of reproducibility of mouse experiments even 
within an apparently identical host chromosomal background. In our 
view this reflects that investigators simply do not consider all of the rele- 
vant genes (that is, the metagenome) and their interactions (Fig. 1). The 
advent of enhanced techniques to genetically modify mice has only accel- 
erated this problem in recent years. Clearly, a variety of environmental 
factors, in addition to mouse genetic background, such as ambient 
temperature, water treatment, diet, and light-dark cycles can also impact 
mouse phenotypes (see below). More recently, many of these same envi- 
ronmental parameters have been linked to alterations of the microbiome. 
Importantly, these new findings indicate an exquisite interplay between 
environmental variables, the microbiome, and animal phenotype, creating 
a veritable ‘witch’s brew’ of concerns about how to interpret what once 
might have been considered simple experiments. 

To clarify the compelling need to confront challenges presented by 
the emerging recognition of the importance of the microbiome and 
metagenome, we provide a non-exhaustive list of linkages of the micro- 
biome to variables known to influence experimental outcomes. In each of 
the cases below, investigators from one facility may not be able to compare 
their results to those of others if they are unaware of, or cannot duplicate, 
the conditions under which experiments were performed. This results 
in apparent lack of reproducibility of findings when in fact, there is not 
a problem with the experiments per se; rather there are unappreciated 
microbiome variables that influence experimental outcomes. 


Linkage of the microbiome to host characteristics presumed 
to be primarily driven by host genetics: 

Example 1. When phenotypes are passed from one generation to the 
next it is most commonly assumed that the mode of inheritance is via 
host chromosomes. The microbiome of mice is largely inherited from 
the mother (dam), a fact generally attributed to both processes around 
birth, co-housing and breast feeding*>**. Importantly, major phenotypic 
differences can be inherited (also called vertical transmission) from the 
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dam in a microbiome-dependent fashion’”. A diagram representing this 
is presented in Fig. 2 showing that the levels of IgA in faeces can be trans- 
mitted vertically (and thus be ‘heritable’) through the microbiome of the 
dam, and can also be transmitted horizontally from one mouse to another 
when IgA ‘high and IgA ‘low’ mice are co-housed. This phenomenon 
also occurs with transmission of segmented filamentous bacteria within 
mouse facilities including vendors”. 

Example 2. Host strain background controls many experimental 
phenotypes and also plays a role in the microbiome composition®. 
Since the microbiome can control host phenotypes, findings attributed 
to genetic background may instead be due to effects of the microbiome. 

Example 3. The gender of mice used for experiments can impact 
both phenotypes and microbiome. For example, the sex of mice alters 
diabetes and autoimmune phenotypes associated with differences in the 
microbiome?™™, 

Example 4. Many experimental phenotypes are related to the age 
of mice. Age can influence the microbiome®, suggesting that some of 
the effects of ageing may be in part due to changes in the microbiome. 
Attention should also be paid in the first two months of life, during which 
time the intestine undergoes rapid changes in development. 

Example 5. The microbiome undergoes profound circadian 
changes, and can control epithelial and immune cell homeostasis and 
metabolism. The light-dark cycle of facilities and time of day of exper- 
iments and sample collection are often not standardized or reported in 
manuscripts. 

Linkage of the microbiome to environmental variables known to influ- 
ence mouse biology: 

Example 1. Intercurrent infection of mice with known pathogens such 
as, to name but a few, Helicobacter species, murine norovirus, mouse 
rotavirus (epizootic diarrhoea of infant mice), pneumocystis and many 
others can alter results of experiments!*'>™, Testing for such pathogens 
is commonly unreported in publications. Variations in these infections 
between mice in different cages and in different facilities have profound 
biologic implications for reproducibility of results between different facil- 
ities and institutions. As an example, persistent norovirus infection of 
germ-free mice can emulate some of the immune-related developmental 
effects normally attributed to the commensal intestinal bacteria“, and 
bacteria are critical to the inflammatory effects induced by norovirus 
infection in mice carrying mutations in genes associated with risk for 
human inflammatory bowel disease'*)». 

Example 2. It is not often appreciated by investigators outside of metab- 
olism research that prevention of cold stress in mice requires an ambient 
temperature of 26-29°C. Most facilities maintain temperatures of 
20-24 °C for the comfort of the investigators®. To compensate for cold 
facilities, mice will titrate nesting material resulting in potential changes 
in mouse phenotype based on bedding availability®. Cold stress of mice 
can profoundly impact mouse physiology including elevating heart rate 
and alterations of basal metabolism driven in part driven by excessive 
glucocorticoid production. Chronic glucocorticoid simulation is immu- 
nosuppressive which in turn has a profound impact on many types of 
experiments including responses to pathogens and the development 
of immunity. Chronic glucocorticoid treatment also changes the intes- 
tinal microbiome®, indicating that effects of cold stress might be on, 
or through, changes in the microbiome. This issue has been addressed 
experimentally by intentionally subjecting mice to extreme cold stress. 
This environmental perturbation alters the microbiome in mice, and 
this alteration can be transmitted to mice not exposed to cold tempera- 
tures. Importantly, the microbiome changes contribute to phenotypes®. 
Another conclusion from this body of work is that the common practice 
of transporting mice from mouse facilities to the laboratory in containers 
that do not include bedding or that travel through cold (or hot) environ- 
ments may alter results. 

Example 3. Water treatment and sterilization can alter both phenotypes 
and the microbiome. Some facilities rely on sterilization by autoclaving, 
while others rely on acidification. The pH of the water in some instances 
is ~2, which can alter both mouse phenotypes and the microbiome®””. 
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BOX | 


Overall scope 


research planning, grants and publications. 


Experimental design 


transmission?. 
is sufficient to control for the effects of the metagenome. 


to discern them. 
Experimental analysis 


Administrative and organizational 


Challenges created by metagenomic influences on mouse phenotypes 


* Failure to recognize the proven impact of the microbiome and metagenome on the acquisition and interpretation of experimental data in 


* Pervading view in many fields that issues related to the microbiome and metagenome affect only mucosal biology and immunity. This view 
is not supported by recent findings that show the marked global impact of the microbiome on many areas of host physiology. 

* The effects of the microbiome and metagenome are biologically, conceptually and bioinformatically complex, making them challenging to 
quantify and analyse even if one accepts the need to account for them. 


* Lack of rigorous accounting for environmental factors that influence the microbiome and metagenome in experimental design 
and interpretation. This will hamper collaboration both within, and increasingly between, countries and continents, thereby limiting 
development of global science by infusing lack of reproducibility of apparently similar experiments. 

+ Incomplete recognition of the fact that the metagenome and microbiome may affect experimental results via both vertical and horizontal 


+ Anerroneous (and common) assumption that descriptive studies involving sequencing and analysis of the intestinal bacterial microbiome 


* Lack of understanding of substantial variations between the mouse facilities and resources available to investigators at different institutions 


* Lack of facile centralized databases for mouse phenotypes and microbiome sequence data sets that are comparable to those used in 
structural biology or gene expression analysis to foster proper experimental design, analysis and comparison between data sets. 

* Severely limited access of investigators to cutting edge computational tools and expertise needed for analysis of the microbiome. These 
resources most often cannot be generated by individual research groups. 

+ Limits of single investigator ‘bandwidth’. The expertise of individual investigators is typically restricted to one or a few of the components of 
the microbiome (for example, the virome, bacterial microbiome or mycobiome). 


+ Failure to recognize and address these challenges cannot be attributed to a single group of stakeholders. 

+ Failure to recognize that solutions to these challenges cannot be implemented by a single group of stakeholders. 

* Fully addressing the impact of the microbiome and metagenome on experimental findings is expensive and resource intensive. 

* Delays in productivity attendant upon full evaluation of the microbiome and metagenome in experimental systems may influence 

promotions and grant competitiveness, thereby inhibiting professional development and training. 

* Lack of consensus across different stakeholder groups of how to deal with these challenges. 
In this work we highlight the importance of the microbiome in altering mouse phenotypes. Importantly, these effects are not exclusively local 
(that is, reserved to the intestine) but instead can have profound effects on diseases that occur in distant organs. These facts present many 
challenges (listed here) for scientific enterprise. What is clear is that there is no simple, single-stakeholder solution. In the latter part of this 
Perspective, we provide a series of experimental and policy recommendations to begin to overcome these obstacles. We feel that the enormous 
potential for both scientific understanding and improving the human condition of defining mechanisms responsible for the impact of the 
microbiome and metagenome on physiology and diseases will not be fully realized unless these challenges are addressed. 


Example 4. Diet has profound effects that can outweigh host genetics”!. 
The composition and handling of mouse chow, with alterations in com- 
position of fat or simple sugars alters both the microbiome and mouse 
phenotypes”!. Other additions to the diet, such as haem (to mimic red 
meat), can impact inflammatory phenotypes and the microbiome”. 

Example 5. Transport of mice from large dedicated breeding/repositories 
facilities is frequently used, particularly to obtain wild-type controls. In 
addition to probably mismatched microbiomes (because mice are from 
different facilities), the travel-related stress to animals (for example, 
temperatures in trucks and planes, handling, time of travel) cannot be 
controlled, and it is not known how long mice take to adjust to the new 
environmental and microbial milieu of a facility. This complicates and 
may invalidate comparisons of mutant mice bred in a research facility 
with wild-type control mice from an outside vendor. 


The case for littermate controls 

Well-established methods exist to control for host, microbiome and 
metagenomic effects (Fig. 1). These methods have an increasingly appre- 
ciated scientific basis, but unfortunately they are not always used. As an 
example, consider the common practice of comparing wild-type mice 
to mice with a mutation when the wild-type and mutant mice are bred 
separately or one group is purchased from an outside facility (Fig. 3). 
The results of such comparisons are often reported in the literature as 


conclusive evidence that a mouse genomic mutation caused the phenotype. 
However, this is not a valid conclusion. When mice are bred separately, 
the microbiome can confer apparently heritable phenotypes’? and 
phenotypes can be transferred between a mutant and a wild-type mouse by 
co-housing!*’3-”°. To complicate this scenario, both viruses and bacteria 
can confer phenotypes in the presence of specific host gene mutations 
through metagenomic effects!" 1>4, 

A well-recognized solution to this problem is to use littermate controls. 
For autosomal genes, breeding heterozygous parents controls for the 
effects of the microbiome (Fig. 3). Breeding heterozygous parents yields all 
of the groups required for phenotypic comparison when combined with 
tracking of which mice come from which dam to control for phenotype 
inheritance via the maternal microbiome™. In the analysis of adult 
mice, use of littermate controls addresses both maternal and early life 
effects of the microbiome which can imprint the immune and metabolic 
systems’®’’, It is the gold standard to determine the dominance of host 
versus microbiome genes in conferring a phenotype. 

Importantly, littermate controls are required for studies of host genes 
that impact organs distant from highly colonized organs such as the 
intestine. This generalizes the importance of the microbiome in exper- 
imental design. Metabolites produced in colonized organs can directly 
influence phenotypes at distant sites including lung>”’*, pancreas”, and 
brain**°, For example, the intestinal bacterial microbiome plays a key 
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Figure 2 | Determining the role of the microbiome within genetically 
equivalent but phenotypically different mice (as in ref. 12). A common 
occurrence is that mice of the same genotype within a given colony can 
show differences in phenotype. In this example, levels of faecal IgA show 
variation in wild-type (WT) mice that were cage dependent: ‘High Ig’ 
denotes detectable levels of faecal IgA (as determined by an ELISA), while 
‘Low IgA denotes levels at the limit of detection of the assay (top panel). 
Here, this phenotype is vertically transmitted. IgA-high mice produce 
progeny that are IgA high and IgA-low mice produce progeny that are IgA 
low (middle panel). Two experiments using horizontal transfer methods 
show dominance of the IgA-low phenotype. First, co-housed IgA-high 

and -low breeding pairs produce mice that are IgA low. Second, faecal 
transfer from IgA-low to IgA-high mice converts IgA-high mice to IgA-low 
(bottom panel). This experiment then generated the material required to 
determine that IgA was degraded by intestinal commensal microbes. These 
experiments show that the IgA-low phenotype is dominant and driven by 
the microbiome. 


role in resistance to pulmonary influenza virus infection®°.*!, In addition, 
mobile cells such as those in the immune system can modify phenotypes 
in distant non-colonized organs“. Thus, experimental design in all areas 
of experimental biology requires consideration of microbiome effects and 
thus the utilization of littermate controls. These considerations conclu- 
sively demonstrate that a simple comparison of mutant and control mice 
bred independently or purchased from a facility cannot be interpreted 
as definitive evidence for the role for a host gene. Comparisons between 
mutant and littermate control mice can, however, identify whether the 
host or microbiome is dominant in controlling the trait under study. 

Importantly, using littermate controls does not rule out a role for 
metagenomic effects. Specifically, it may still be the microbiome that 
dominantly modulates a trait that is also dependent on a host gene (or vice 
versa; Figs 1 and 3). To address this possibility, a next step in cases with a 
dominant host genetic contribution is to define the potential role of the 
microbiome by evaluating the phenotype using additional dams, assessing 
variation between cages”, using faecal transplantation or co-housing, and 
by analysing mice re-derived into a separate facility (Fig. 4). These stud- 
ies can demonstrate the impact of a combination of host and microbial 
effects that must be interrogated to define the mechanisms that underlie 
experimental observations (Fig. 3). 

Many studies demonstrate that phenotypes can be transferred 
between mice by either co-housing or faecal transplantation!!"!47>79, 
Investigators should be aware that using this valuable approach in adult 
mice can address the effects of the microbiome independent of effects on 
development. In our opinion, these experiments supplement, but do not 
replace, the use of littermate controls (Fig. 3). In addition to their value for 
determining whether a phenotype is influenced by the microbiome after 
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Add trait-related microbe 
Figure 3 | Use of littermate controls as a gold standard to control for 
the effects of the microbiome and metagenome. When control and 
mutant mice are bred independently, a difference in phenotype cannot 
be simply attributed to the difference in host gene(s). This applies to all 
physiologic contexts. Experiments that use littermate controls must be 
performed to define whether the phenotype is dominantly controlled by 
host gene(s), microbial gene(s) or a combination of both types of genes. If 
the results of such experiments show the mutant mouse phenotype is still 
distinct from controls, then the trait is probably dominantly controlled by 
host chromosomal genes. If the mutant mouse phenotype resembles the 
littermate controls regardless of host genotype, then the trait is probably 
dominantly controlled by the microbiome. It is important to note that 
these conclusions are not definitive; the possibility will still exist that a 
combination of host and microbiome (metagenomic) influences may drive 
a given trait. Additional experiments must be performed to test for this 
possibility (see additionally Fig. 4). 


development of the animal, these studies can provide the directionality 
of the influence (Fig. 4a). The mechanism of microbial dominance can be 
through effects of individual microbes, changes in community structure 
or combinations of these with host genes!>!4738°, 

Use of isobiotic mice®**>, those with exposure to only a defined set 
of organisms, is also an attractive experimental system for analysis of 
microbiome effects. Germ-free animals reconstituted with defined 
bacteria, human microbiome samples, or the microbiome of convention- 
ally raised mice also have a key role in defining the role of the microbiome 
in physiology, although the effects of the microbiome on normal develop- 
ment are not addressed with such experiments!*”**®, Such experimental 
systems have been important for establishing the role of host microbial 
communities in modulation and host phenotype. 

Lastly, to most conclusively address the role for the metagenome, 
certain key experiments should be performed in multiple animal facilities in 
order to draw firm conclusions about the generality of a role of host and/ 
or microbiome genes in a phenotype. Such experiments are particularly 
critical for new and foundational areas of biology. Just as the clinical world 
relies on multi-centre trials as its gold standard for treatment efficacy, 
multi-centre vivarium studies, sometimes as a follow-up of initial obser- 
vations, would solidify the conclusions regarding of the generalizable role 
of the microbiome in a given phenotype (Fig. 4b). Such an approach has 
the potential to eliminate the confusion regarding phenotypes erroneously 
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Figure 4 | Additional experiments to test for the relative dominance 

of host chromosomal or microbiome traits. a, In addition to testing 

for littermate controls (see Fig. 3), faecal transplant between control 

and mutant mice can be performed. If the host gene is dominant, 

the phenotype will not be transferred horizontally or by microbiome 
transplantation. If a microbe (or microbes) is (are) dominant, then the 
phenotype will be transferred from the mice of one genotype to the other 
genotype. We note that co-housing is also a viable method and in the event 
that co-housing transfers a phenotype but faecal transplant does not, it 

is possible that the phenotype is due to a microbiome component (for 
example, enveloped virus, anaerobic bacteria) that does not survive the 
transplantation procedure but may efficiently spread between live animals. 
Mouse genotype is indicated by text (control or mutant), phenotype is 
indicated by colour: red, normal; blue, altered. b, A further test to evaluate 
the generality of the effects of the metagenome on a phenotype is to 
transfer mice to additional mouse facilities and determine the phenotype 
in those facilities. This may confirm the phenotype in the initial facility 
(blue box) or not (green box). If not, this is an opportunity to define why 
there is an altered phenotype by comparing the environment between 
different facilities in a controlled manner. 


linked to genotypes that has befuddled many fields, wasting time and crit- 
ical resources as experiments are performed to resolve apparent contro- 
versies. For robust mouse phenotypes that define a specific disease model, 
such meta-analysis across multiple facilities would be ground-breaking 
and diminish confusion surrounding the variability of certain models. 
There is much folklore in the biology community with regard to reasons 
for lack of reproducibility, but little scientific action. There are several 
examples of low-hanging fruit that should be addressed sooner than later. 
For example, the facility-dependent variation in the onset of diabetes in 
non-obese diabetic mice is a major impediment for the field®”**, In addi- 
tion to investigator collaboration, funding agencies will have to recognize 
the utility of this endeavour by providing funding, as will high-impact 
journals by prioritizing studies that meet this demanding standard. 


Role of microbiome sequencing 
Based on rapid advances in sequencing technologies, improvement in 
databases containing sequences from microorganisms and development 
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of the bioinformatic capacity to map sequences to the genomes of specific 
types of microorganisms, it has become increasingly clear that mammals 
are composite organisms containing genes not only from host cell chro- 
mosomes, but genes from many microbes?®79!8:89-91_ Th our opinion, 
however, simply reporting the ‘sequence’ of the microbiome does not 
solve the problem of incorporating effects of the microbiome into analysis 
of experimental findings. This is because the ‘sequenome, meaning the 
total sequences that are obtained from sequencing random nucleic acid 
fragments from the microbiome (shotgun libraries), does not completely 
describe the metagenome. There are multiple reasons for this mismatch. 
First, not all elements of the microbiome (that is, archaeal, bacterial, viral, 
fungal, etc.) can be analysed by the same sample preparations. This is a 
challenge for limited samples as well as for bioinformatic comparisons of 
data sets derived from different sample preparations. Second, shotgun 
sequence libraries contain many sequences that cannot be annotated to 
any organism in current databases. This genetic ‘dark matter’ probably 
contains sequences from as-yet-unidentified, but physiologically impor- 
tant, components of the microbiome. Inability to characterize the entire 
sequenome is a key limitation to current studies of the metagenome. 
Third, functional microbes may be of such relative low abundance that 
they may not be accounted for by sequencing”. Lastly, the functional 
impact of metabolites or particles such as outer membrane vesicles will 
not be detected by sequencing”. 

Analysis of the bacterial microbiome has advanced more rapidly than 
analysis of other organisms within microbiome. This has occurred in 
part because of robust databases for bacterial 16S rRNA genes or pro- 
grams that link shotgun sequences to specific bacterial taxa (that is, 
MetaPhlAn™) allowing the investigator to correlate sequence with 
the presence of specific bacteria (often down to the level of genus or 
species). However, because analysis of the bacterial microbiome has devel- 
oped more rapidly than analysis of the metagenome, there has been an 
unfortunate tendency in the literature and in scientific presentations (and 
thought) to use the term ‘microbiome'’ to refer to bacteria only, ignoring 
other types of organisms. In addition, more attention has been paid to 
microbiome members with DNA genomes than those with RNA genomes 
due to the use of libraries made without reverse transcription. There are 
many members of the microbiome with RNA genomes; for example many 
enteric RNA viruses are detected in apparently healthy mammals*°. 

The number of organisms, and therefore the number of genes, constituting 
the microbiome is enormous. In the intestine alone, it is estimated that 
there are about 100-trillion prokaryotic cells at densities of up to 10!!-10! 
cells ml! in the colon (fewer more proximally)®°. Bacteriophages that 
infect these prokaryotes may be in the range of ~tenfold more abundant 
than their respective host cells*. Further, humans carry an average of at 
least ten permanent chronic eukaryotic viral infections that substantially 
influence immunity and host gene expression®”, and faecal samples from 
healthy children contain a range of eukaryotic viruses including members 
of genera that contain pathogens*. Mammalian chromosomes contain 
multiple genetic elements related to retroviruses, and recombination 
amongst these can generate new infectious retroviruses in mice, indi- 
cating that our own genomes may be the source of novel elements of the 
virome*”. Members of the virome may also be living within other eukar- 
yotic organisms, such as parasites, that infect the host?’. The microbiome 
also contains a fungal mycobiome” and meifauna such as protozoa and 
helminthic worms*'°!"!, Thus, defining two mice as having equivalent 
microbiomes by shotgun sequencing is currently not possible, and the field 
is too immature to mandate specific standards for microbiome analysis 
due to the rapid evolution of this discipline. 


Should we standardize the microbiome? 

It is clearly challenging to choose a set of even bacteria for a given set of 
experiments (for example, using isobiotic mice). Further, this approach 
will miss the critical effects of other components of the microbiome 
(viruses, etc.). In addition, it is important not to assume that the different 
components of the microbiome act independently; instead trans-kingdom 
interactions between components of the microbiome can dramatically 
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affect the biology of the host!™. Bacteriophages participate in horizontal 
gene transfer between prokaryotes and participate in predator-prey 
relationship creating a dynamic community structure comprised of 
the bacterial microbiome and the bacteriophage component of the 
virome!*!3-105. The induction of replicative bacteriophages which can 
infect other bacteria can be induced by environmental factors such as 
nitric oxide, antibiotics, and nutrient availability° Thus, the environ- 
ment can control the bacteria microbiome via effects on the virome. 

In turn, the bacterial microbiome and individual bacterial products can 
regulate the ability of eukaryotic viruses to establish both acute and persis- 
tent infection in mice’™*!°7"!"!, Chronic systemic viral infections control 
the level of innate and adaptive immunity to bacteria, parasites, other 
viruses, tumours, and regulate autoimmunity® 1017, Vaccine responses 
to viruses are controlled by the microbiome**!!3-5, 

The intestinal meiofauna is complex and present in asymptomatic 
individuals*?*!!°"!”, and may alter inflammatory bowel disease, multiple 
sclerosis, rheumatoid arthritis, type 1 diabetes and asthma!!®, As an 
example, sequencing the mycobiome in faecal samples from apparently 
healthy individuals detected fungi in every sample tested and over 50 
fungal genera!!”. 

This level of taxonomic complexity, and the existence of trans-kingdom 
interactions within the metagenome that have important biological 
effects, means that defined microbiomes cannot yet be a standard for 
all experiments if we expect biological experimentation to encompass 
all mechanisms that operate in human populations. Focusing ona single 
‘standard’ microbiome will prevent analysis of major variations in biology 
that occur when the microbiome varies in the same chromosomal genetic 
background, and thus important opportunities will be missed to discover 
new interactions of the microbiome with the host. 


Addressing challenges of metagenomic effects 

Given all of the above, we are confronted, as investigators and recipients of 
research support, with the problem of how to deal with these complex but 
now proven-pertinent considerations. We feel that significant problems 
contributing to lack of experimental reproducibility, proper interpretation 
of experimental findings and our failure to realize the full potential of 
studies of the microbiome and metagenome to generate new paradigms 
in biology cannot be simply solved by dictating and enforcing ‘favoured’ 
experimental procedures for design of mouse experiments. While we 
do strongly favour certain methods based on scientific evidence (see 
description of littermate controls and other valuable approaches above), 
proposing ‘rules’ meant to rigidly apply to all studies would be unproduc- 
tive. Instead, we feel strongly that a specific scientific enterprise (such as 
the use of mouse models), if given the proper tools and framework, can 
move towards reproducibility without compromising new discoveries or 
regimenting science. 

We have evaluated how other scientific communities that rely on com- 
parability and integrity of data across institutions and investigators deal 
with these issues. We are particularly impressed with the fast evolution 
of the expression microarray field. In the early days of this field (the late 
1990s), there was much concern that data from experiments could not 
readily be compared between laboratories or platforms and there were 
substantial concerns about the overall quality and utility of the data. 
Instead of dictating a single favoured platform, the field embraced the 
complexity of technical development of platforms by simply reporting 
how things were done. There were two key proactive developments. First 
was the insistence that all parameters of an experiment be reported (the 
consensus became reporting minimum information about a microarray 
experiment, MIAME). This greatly facilitated comparison of platforms 
and identified trouble spots. This approach was quite effective at stream- 
lining procedures and facilitating cross comparisons across platforms. 
Second, microarray data was required to be deposited and made freely 
available upon publication. Journal editors in particular have been impor- 
tant enforcers of such depositions. 

Thus, we propose to mandate the creation of reporting systems for all 
experimental models that will facilitate cross-comparison of experimental 
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design and data obtained from mouse experiments. While we suggest a set 
of parameters to be reported (Box 2), we recognize that this list is probably 
incomplete and will need refinement by the research community. We 
feel that transparent sharing of information is the missing piece needed 
to solve this set of problems, and that participation in such a system is an 
ethical responsibility for investigators that utilize public and private funds. 
Some microbiome researchers have already embraced this type of trans- 
parency and steps are being taken to provide frameworks for reporting 
of key variables in mouse experiments!”°. Consensus conferences would 
be the logical next steps to produce a concrete list of information that is 
required to be shared by investigators (see below). 


Facilitating mouse model reproducibility 
The solution of the metagenomics ‘problem’ must be addressed simul- 
taneously from multiple sources and through the actions of different 
stakeholder groups. This will require recognition of this issue at many 
levels as well as the motivation to take appropriate action. We delineate 
the responsibilities of various entities in improving experimental repro- 
ducibility and enhancing the value of investments in research below: 
(1) Individual investigators. It is incumbent on the individual investigator 
to acknowledge and control for the metagenome in all mouse experiments 
in all areas of investigation for reasons described in this review. For labs 
that use mouse models, this level of understanding must be effectively 
communicated to all members of a laboratory including technicians, grad- 
uate students and postdocs. We recommend that littermate controls are the 
minimum gold standard for experimentation for non-gnotobiotic facil- 
ities. We recognize that other methods may be used by investigators and 
appropriate interpretations as to the role of the microbiome and metage- 
nome must be made based on the method (Figs 3 and 4). Details of mouse 
experimental parameters must be clearly documented (for example, 
Box 2); this will allow the community to judge the value of reported 
findings and attempt to replicate them. If sequencing of the microbiome 
is undertaken, the methods for generating these data and the primary 
sequence files must be made freely available. Reported information will 
provide investigators the ability to evaluate the credibility of the results 
and place them in a proper context in the field. 
(2) Government and business sectors. We must define MIAME-like crite- 
ria for mouse experiments. Many of the features are touched on in this 
review, although we recognize that this is not an all-inclusive set of con- 
siderations and reflects our own opinions rather than an expert panel 
or community consensus. Thus input and consensus across disciplines 
will be required to define required reporting elements. We propose that 
a series of consensus conferences, to include the key stakeholders, be 
established to generate and update these MIAME-like criteria. Further, a 
fully funded commission or committee should be created to monitor the 
science, study the specific effects of policies put in place, adapt standards 
for reporting to the literature, and to identify and remediate unanticipated 
negative consequences of well-meaning policies that will certainly arise. 
An essential element of a successful plan will be the development and 
maintenance of data repositories to allow investigators and others to read- 
ily search for mouse phenotypes with a given host background and com- 
pare associated experimental details for each experiment. Standardized 
methods of reporting such information will facilitate and push fields 
in the direction of best practices that facilitate reproducibility. Such a 
database will be transformative for many fields as it will finally allow for 
facile cross comparison of experiments within a defined host genotype. 
Providing links to other OMICs data collection (that is, transcription 
microarrays, RNA-seq, microbiome analysis) would increase the power 
of this approach. Only government agencies, perhaps in combination with 
industry, are well positioned to foster, fund, and maintain such databases. 
(3) Journal editors and reviewers. In order to publish, investigators must 
provide data critical for interpretation and establishing the reproducibility 
of results. Journals must enforce reporting of key experimental param- 
eters and deposition of data in central databases for mouse phenotypes. 
Editors and journal policies must provide the ‘teeth’ for this endeavour. 
Without the commitment of the editorial process, this idea will die, and 
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mouse exper iments 


Host genetics 


Experimental methods within mouse facility 
facility to experiment). 
* Breeding scheme to generate experimental and control mice. 


« Number and gender of mice analysed per experiment. 
* Number of experiments performed. 


Husbandry details 
* pH of drinking water. 


* Caging type (for example, ventilated, metabolic). 
* Bedding amount per cage and type. 

* Frequency and protocol for cage changing. 

« Light-dark cycle of room. 

* Temperature of room (include range). 


Microbiome analysis 
* Methods of sample collection, library preparation. 
« Analytical pipeline including version and database dates. 
+ Methods of statistical analysis. 


the reproducibility and quality of science will continue to be less than 
optimal. The pressure to provide this information in order to publish is 
a very powerful motivational force. Furthermore, in the review process, 
editors will need to give precedence for publication to studies that address 
effects of the microbiome and metagenome over studies that fail to do 
so. The end product will make experimental conditions transparent and 
comparable which will have a profound effect on the value of the scientific 
enterprise to the public. 

(4) Academic institutions. The leadership of academic institutions (for 
example, presidents governing boards, chancellors, provosts, deans, 
department and division heads) must develop directed plans to create 
the infrastructure and faculty expertise that is required to properly per- 
form animal experimentation in a metagenomic world. This will require 
money. It is critical that individual investigators using mouse models 
have the expertise available within their institution or through consortia 
to address issues of the metagenome. This includes access to computa- 
tional programs, methods to evaluate the microbiome as well as experts 
in computational biology and various disciplines of microbiology (that 
is, bacteriologists, virologists). 

(5) Granting institutions, study section heads and reviewers. Grant pro- 
posals must document practices and controls for mouse experiments. 
Adherence to defined community-designed and -supported best prac- 
tices will need to be promised and acted upon. Experimental procedures 
that account for the microbiome and metagenome must be described. 
Institutional resources available to the investigator must be documented 
so that it is clear how these issues will be addressed. Lastly, plans to 
deposit information about experimental parameters and data deposi- 
tion must be described. These details should be required for all grant 
proposals that use mouse models. 


Example of types of data to be provided for submitted manuscripts with 


* Specify strain using JAX or other commercial vendor nomenclature. 

* Original source for purchased or shared mice used for breeders used to create colony. 

* For mixed background, include data defining strain percentage (microsatellite analysis, number of markers). 

+ Define the method used to create the mutation (for example, homologous recombination in embryonic stem cells, transposon mutagenesis, 
chemical mutagenesis, Cas/CRISPR systems). Show data validating the altered allele. 

* Source of experimental and control mice (for example, bred in facility, purchased from specified vendor; for latter interval from arrival in 


* Control for microbiome effect (for example, littermates, multiple dams, co-housing, faecal transplant, gnotobiotic). 


* Number of breeding pairs used to generate progeny for analysis. 
+ Antibiotic exposure (type and duration) of breeders and progeny. 


* Diet source (vendor, nutrient composition), storage (temperature, duration) and treatment (irradiation, autoclave). 


+ Pathogen screening (organisms tested for, methods, source of analysis in house versus commercial vendor). 


* Specify method used if corrections for multiple comparisons were performed. 
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Conclusion 

We expect that some of the opinions expressed here will be controver- 
sial. However, while hard to implement and expensive, we believe that 
the data are now conclusive that such efforts are required to optimize 
experimental biology in the coming decades. The first step will be 
recognition and acceptance of the importance of the effects of the micro- 
biome and metagenome. This can only come if investigators have access 
to the information for other experiments in the literature that is required 
for interpretation and reproduction of findings. In the end, the micro- 
biome and metagenome are coming into their own as a field driving 
innovation in experimental biology. We believe that the complexity of 
the metagenome should be embraced for its potential to change how we 
view ourselves and fellow mammals as organisms, and for the potential 
to truly understand who we are and how to better our lives. After all, we 
are composite organisms containing far more genes than are encoded in 
our own chromosomes. 
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Modern humans arrived in Europe ~45,000 years ago, but little is known about their genetic composition before the start 
of farming ~8,500 years ago. Here we analyse genome-wide data from 51 Eurasians from ~45,000-7,000 years ago. Over 
this time, the proportion of Neanderthal DNA decreased from 3-6% to around 2°, consistent with natural selection against 
Neanderthal variants in modern humans. Whereas there is no evidence of the earliest modern humans in Europe contributing 
to the genetic composition of present-day Europeans, all individuals between ~37,000 and ~14,000 years ago descended 
from a single founder population which forms part of the ancestry of present-day Europeans. An ~35,000-year-old 
individual from northwest Europe represents an early branch of this founder population which was then displaced across a 
broad region, before reappearing in southwest Europe at the height of the last Ice Age ~19,000 years ago. During the major 
warming period after ~14,000 years ago, a genetic component related to present-day Near Easterners became widespread in 
Europe. These results document how population turnover and migration have been recurring themes of European prehistory. 


Modern humans arrived in Europe around 45,000 years ago and have 
lived there ever since, even during the Last Glacial Maximum 25,000- 
19,000 years ago when large parts of Europe were covered in ice’. A 
major question is how climatic fluctuations influenced the popula- 
tion history of Europe and to what extent changes in material cultures 
documented by archaeology corresponded to movements of people. 
To date, it has been difficult to address this question because genome- 
wide ancient DNA has been retrieved from just four Upper Palaeolithic 


individuals from Europe?*. Here we assemble and analyse genome- 
wide data from 51 modern humans dating from 45,000 to 7,000 years 
ago (Extended Data Table 1; Supplementary Information section 1). 


Ancient DNA retrieval 

We extracted DNA from human remains in dedicated clean rooms’, 
and transformed the extracts into Illumina sequencing libraries®*. 
A major challenge in ancient DNA research is that the vast majority 


1Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, IVPP, CAS, Beijing 100044, China. @Department of Genetics, Harvard Medical School, Boston, 
Massachusetts 02115, USA. ?Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany. “Institute for Archaeological Sciences, Archaeo- 
and Palaeogenetics, University of Tubingen, 72070 Tiibingen, Germany. "Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany. °Broad 
Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA. 7Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts 02115, USA. 8School of Archaeology 
and Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland. °CIAS, Department of Life Sciences, University of Coimbra, 3000-456 Coimbra, Portugal. !°Australian Centre for Ancient 
DNA, School of Biological Sciences, The University of Adelaide, SA-5005 Adelaide, Australia. !!Department of Human Evolution, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, 
Germany. !Institute of Archaeology and Ethnography, Russian Academy of Sciences, Siberian Branch, 17 Novosibirsk, RU-630090, Russia. !3Altai State University, Barnaul, RU-656049, Russia. 
1M4Dipartimento di Civilta e Forme del Sapere, Universita di Pisa, 56126 Pisa, Italy. !SDepartment of Biology, University of Pisa, 56126 Pisa, Italy. !°Direction régionale des affaires culturelles 
Rhéne-Alpes, 69283 Lyon, Cedex 01, France. !’Dipartimento di Biologia, Universita degli Studi di Bari ‘Aldo Moro’, 70125 Bari, Italy. !8Instituto Internacional de Investigaciones Prehistéricas, 


Universidad de Cantabria, 39005 Santander, Spain. !?Department of Anthropology, MSCO1 


040, University of New Mexico, Albuquerque, New Mexico 87131-0001, USA. ?°Quaternary 


Archaeology, Institute for Oriental and European Archaeology, Austrian Academy of Sciences, 1010 Vienna, Austria. 2!Department of Anthropology, Natural History Museum Vienna, 1010 Vienna, 
Austria. 2@Department of Anthropology, University of Vienna, 1090 Vienna, Austria. 23“Emil Racovita” Institute of Speleology, 010986 Bucharest 12, Romania. 244E mil Racovita” Institute of 
Speleology, Cluj Branch, 400006 Cluj, Romania. @°Department of Cultural Heritage, University of Bologna, 48121 Ravenna, Italy. 2°Sezione di Scienze Preistoriche e Antropologiche, Dipartimento 
di Studi Umanistici, Universita di Ferrara, 44100 Ferrara, Italy. @’Universita degli Studi di Bari ‘Aldo Moro’, 70125 Bari, Italy. @Museo di “Civilta preclassiche della Murgia meridionale”, 72017 
Ostuni, Italy. ?°Dipartimento di Biologia, Universita di Firenze, 50122 Florence, Italy. ?°Dipartimento di Scienze Fisiche, della Terra e dell’Ambiente, U.R. Preistoria e Antropologia, Universita 

degli Studi di Siena, 53100 Siena, Italy. ?!CNRS/UMR 7041 ArScAn MAE, 92023 Nanterre, France. 32INRAP/UMR 8215 Trajectoires 21, 92023 Nanterre, France. ?2Ulmer Museum, 89073 Ulm, 
Germany. *“University of Bucharest, Faculty of Geology and Geophysics, Department of Geology, 01041 Bucharest, Romania. °°Department of Anthropology, California State University Northridge, 
Northridge, California 91330-8244, USA. 3*Université de Bordeaux, CNRS, UMR 5199-PACEA, 33615 Pessac Cedex, France. 37TRACES - UMR 5608, Université Toulouse Jean Jaurés, Maison de la 


Recherche, 31058 Toulouse Cedex 9, France. 38Royal Belgian Institute of Natural Sciences, 


000 Brussels, Belgium. 3°7Department of Archaeology, School of Culture and Society, Aarhus University, 


8270 Hgjbjerg, Denmark. *°Service Régional d’Archéologie de Franche-Comté, 25043 Besancon Cedex, France. “!Laboratoire Chronoenvironnement, UMR 6249 du CNRS, UFR des Sciences 

et Techniques, 25030 Besancon Cedex, France. 4#7Department of Geosciences, Biogeology, University of Tubingen, 72074 Tibingen, Germany. “?Senckenberg Centre for Human Evolution and 
Palaeoenvironment, University of Tubingen, 72072 Tubingen, Germany. “*Department of Early Prehistory and Quaternary Ecology, University of Tubingen, 72070 Tiibingen, Germany. “Institute 
for Archaeological Sciences, Paleoanthropology, University of Tubingen, 72070 Tubingen, Germany. “Museum of Anthropology and Ethnography, Saint Petersburg 34, Russia. “”Department of 
Anthropology, Faculty of Science, Masaryk University, 611 37 Brno, Czech Republic. “Institute of Archaeology at Brno, Academy of Science of the Czech Republic, 69129 Dolni Véstonice, Czech 
Republic. “2Department of Archaeology, Simon Fraser University, Burnaby, British Columbia V5A 186, Canada. 


*These authors contributed equally to this work. 
§These authors jointly supervised this work. 


00 MONTH 2016 | VOL 000 | NATURE | 1 


© 2016 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


50 ka 

40 ka 

30 ka : a ee . 

7 20 ka a ie 

— El Mirén 40 ka . 5 = 
— Mal’ta : en 
— Satsurblia 
— Unassigned 
— Véstonice 
— Villabruna 


Figure 1 | Location and age of the 51 ancient modern humans. Each bar 
corresponds to an individual, the colour code designates the genetically 
defined cluster of individuals, and the height is proportional to age (the 
background grid shows a projection of longitude against age). To help in 


of the DNA extracted from most specimens is of microbial ori- 
gin, making random shotgun sequencing prohibitively expensive. 
We addressed this problem by enriching the libraries for between 
390,000 and 3.7 million single nucleotide polymorphisms (SNPs) in 
the nuclear genome via hybridizing to pools of previously synthesized 
52-base-pair oligonucleotide probes targeting these positions. This 
makes it possible to generate genome-wide data from samples with 
high percentages of microbial DNA that are not practical to study by 
shotgun sequencing”. We sequenced the isolated DNA fragments 
from both ends, and mapped the consensus sequences to the human 
genome (hg19), retaining fragments that overlapped the targeted 
SNPs. After removing fragments with identical start and end posi- 
tions to eliminate duplicates produced during library amplification, 
we chose one fragment at random to represent each individual at 
each SNP. 

Contamination from present-day human DNA isa danger in ancient 
DNA research. To address this, we took advantage of three charac- 
teristic features of ancient DNA (Supplementary Information section 
2). First, for an uncontaminated specimen, we expect only a single 
mitochondrial DNA sequence to be present, allowing us to detect 
contamination as a mixture of mitochondrial sequences. Second, 
because males carry a single X chromosome, we can detect contam- 
ination in male specimens as polymorphisms on chromosome X!”. 
Third, cytosines at the ends of genuine ancient DNA molecules 
are often deaminated, resulting in apparent cytosine to thymine 
substitutions!', and thus we can filter out contaminating molecules 
by restricting analysis to those with evidence of such deamination”. 
For libraries from males with evidence of mitochondrial DNA con- 
tamination or X chromosomal contamination estimates >2.5%—as 
well as for all libraries from females—we restricted the analyses to 
sequences with evidence of cytosine deamination (Supplementary 
Information section 2). After merging libraries from the same indi- 
vidual and limiting to individuals with >4,000 targeted SNPs covered 
at least once, 38 individuals remained, which we merged with newly 
generated shotgun sequencing data from the Karelia individual? 
(2.0-fold coverage), and published data from ancient?“*”!>-!? and 
present-day humans”’. The final data set includes 51 ancient mod- 
ern humans, of which 16 had at least 790,000 SNPs covered (Fig. 1; 
Extended Data Table 1). 
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visualization, we add jitter for sites with multiple individuals from nearby 
locations. Four individuals from Siberia are plotted at the far eastern edge 
of the map. ka, thousand years ago. 


Natural selection reduced Neanderthal ancestry over time 

We used two previously published statistics*”! to test if the propor- 
tion of Neanderthal ancestry in Eurasians changed over the last 45,000 
years. Whereas on the order of 2% of present-day Eurasian DNA is 
of Neanderthal origin (Extended Data Table 2), the ancient modern 
human genomes carry significantly more Neanderthal DNA (Fig. 2) 
(P<«10~!”). Using one statistic, we estimate a decline from 4.3-5.7% 
from a time shortly after introgression to 1.1-2.2% in Eurasians today 
(Fig. 2). Using the other statistic, we estimate a decline from 3.2-4.2% 
to 1.8-2.3% (Extended Data Fig. 1 and Extended Data Table 3). Because 
all of the European individuals we analysed dating to between 37,000 
and 14,000 years ago are consistent with descent from a single founding 
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Figure 2 | Decrease of Neanderthal ancestry over time. Plot of 
radiocarbon date against Neanderthal ancestry for individuals with 

at least 200,000 SNPs covered, along with present-day Eurasians (standard 
errors are from a block jackknife). The least squares fit (grey) excludes 

the data from Oase1 (an outlier with recent Neanderthal ancestry) and 
three present-day European populations (known to have less Neanderthal 
ancestry than east Asians). The slope is significantly negative for all eleven 
subsets of individuals we analysed (10-7° < P< 107"! based on a block 
jackknife) (Extended Data Table 3). Bp, before present. 
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population, admixture with populations with lower Neanderthal 
ancestry cannot explain the steady decrease in Neanderthal-derived 
DNA that we detect during this period, showing that natural selection 
against Neanderthal DNA must have driven this phenomenon (Fig. 2). 
We also obtained an independent line of evidence for selection 
from our observation that the decrease in Neanderthal-derived 
alleles is more marked near genes than in less constrained regions 
of the genome (P = 0.010) (Extended Data Table 3; Supplementary 
Information section 3)??-?°. 


Chromosome Y, mtDNA, and significant mutations 

We used the proportion of sequences mapping to the Y chromosome 
to infer sex (Extended Data Table 4; Supplementary Information 
section 4), and determined Y chromosome haplogroups for the males. 
We were surprised to find haplogroup R1b in the ~14,000-year-old 
Villabruna individual from Italy. While the predominance of R1b in 
western Europe today owes its origin to Bronze Age migrations from 
the eastern European steppe’, its presence in Villabruna and ina 
~7,000-year-old farmer from Iberia? documents a deeper history of 
this haplotype in more western parts of Europe. Additional evidence 
of an early link between West and East comes from the HERC2 locus, 
where a derived allele that is the primary driver of light eye colour 
in Europeans appears nearly simultaneously in specimens from Italy 
and the Caucasus ~14,000-13,000 years ago. Extended Data Table 5 
presents results for additional alleles of biological importance. When 
analysing the mitochondrial genomes we noted the presence of 
haplogroup M in a ~27,000-year-old individual from southern 
Italy (Ostunil) in agreement with the observation that this hap- 
logroup, which today occurs in Asia and is absent in Europe, was 
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present in pre-Last Glacial Maximum Europe and was subse- 
quently lost”®. We also find that the ~33,000-year-old Muierii2 from 
Romania carries a basal version of haplogroup U6, in agreement 
with the hypothesis that the presence of derived versions of this 
haplogroup in North Africans today is due to back-migration from 


western Eurasia?’ 


Genetic clustering of the ancient specimens 

This data set provides an unprecedented opportunity to study the pop- 
ulation history of Upper Palaeolithic Europe over more than 30,000 
years. In order not to prejudice any association between genetic and 
archaeological groupings among the individuals studied, we first 
allowed the genetic data alone to drive the groupings of the specimens, 
and only afterward examined their associations with archaeological 
cultural complexes. We began by computing f;-statistics'* of the form 
f3(X; Y; Mbuti), which measure shared genetic drift between a pair of 
ancient individuals after divergence from an outgroup (here Mbuti 
from sub-Saharan Africa) (Fig. 3a and Extended Data Fig. 2). Through 
multi-dimensional scaling (MDS) analysis of this matrix (Fig. 3b), as 
well as through D-statistic analyses*® (Supplementary Information 
section 5), we identify five clusters of individuals who share substan- 
tial amounts of genetic drift. We name these clusters after the oldest 
individual in each cluster with >1.0-fold coverage (Supplementary 
Information section 5; Extended Data Table 1). In contrast, we were 
not able to identify clear structure among the individuals studied 
based on model-based clustering””*°, which may reflect the fact that 
many of the individuals are so ancient that present-day human var- 
iation is not very relevant to understanding their patterns of genetic 
differentiation*'’. The ‘Véstonice Cluster’ is composed of 14 pre-Last 
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Figure 3 | Genetic clustering of the ancient modern humans. a, Shared 
genetic drift measured by f3(X, Y; Mbuti) among individuals with at least 
30,000 SNPs covered (for AfontovaGora3, ElMiron, Falkenstein, GoyetQ-2, 
GoyetQ53-1, HohleFels49, HohleFels79, LesCloseaux13, Ofnet, Ranchot88 
and Rigney1, we use all sequences for higher resolution). Lighter colours 
indicate more shared drift. b, Multi-dimensional scaling (MDS) analysis, 
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computed using the R software cmdscale package, highlights the main 
genetic groupings analysed in this study: Véstonice Cluster (brown), Mal’ta 
Cluster (pink), El Mirén Cluster (yellow), Villabruna Cluster (light green), 
and Satsurblia Cluster (dark purple). The affinity of GoyetQ116-1 (dark 
green) to the El Miron Cluster is evident in both views of the data. 
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Figure 4 | Population history inferences. a, Admixture graph relating 
selected high coverage individuals. Dashed lines show inferred 

admixture events; the estimated mixture proportions fitted using the 
ADMIXTUREGRAPH software are labelled** (the estimated genetic drift 
on each branch is given in a version of this graph shown in Supplementary 
Information section 6). The individuals are positioned vertically based 

on their radiocarbon date, but we caution that the population split times 
are not accurately known. Colour is used to highlight important early 
branches of the European founder population: the Kostenki14 lineage 

is modelled as the predominant contributor to the Véstonice Cluster 


Glacial Maximum individuals from 34,000-26,000 years ago, who are 
all associated with the archaeologically defined Gravettian culture. The 
“Mal'ta Cluster’ is composed of three individuals who lived between 
24,000-17,000 years ago from the Lake Baikal region of Siberia. The 
‘El Miron Cluster’ is composed of seven post-Last Glacial Maximum 
individuals from 19,000-14,000 years ago, who are all associated with 
the Magdalenian culture. The ‘Villabruna Cluster’ is composed of 
15 post-Last Glacial Maximum individuals from 14,000-7,000 years 
ago, associated with the Azilian, Epipaleolithic and Mesolithic cultures. 
The ‘Satsurblia Cluster’ is composed of two individuals from 13,000- 
10,000 years ago from the southern Caucasus”. Ten individuals were 
not assigned to any cluster, either because they represented distinct 
early lineages (Ust’-Ishim, Oase1, Kostenki14, GoyetQ116-1, Muierii2, 
Cioclovinal and Kostenki12), because they were admixed between 
clusters (Karelia or Motala12), or because they were of very different 
ancestry (Stuttgart). To classify the ancestry of additional low coverage 
individuals, we built an admixture graph that fits the allele frequency 
correlation patterns among high-coverage individuals”® (Fig. 4a; 
Supplementary Information section 6). We fit each low-coverage indi- 
vidual into the graph in turn, using all DNA fragments from these 
individuals, rather than just fragments with evidence of cytosine deam- 
ination, and account for contamination by modelling (Supplementary 
Information section 7). 


A founding population for Europeans 37-14 ka 

A previous genetic analysis of early modern humans in Europe using 
data from the ~37,000-year-old Kostenkil4 suggested that the pop- 
ulation to which Kostenki14 belonged harboured within it the three 
major lineages that exist in mixed form in Europe today*’*: (1) a lin- 
eage related to all later pre-Neolithic Europeans, (2) a “Basal Eurasian’ 
lineage that split from the ancestors of Europeans and east Asians 
before they separated from each other; and (3) a lineage related to 
the ~24,000-year-old Mal’tal from Siberia. With our more extensive 
sampling of Ice Age Europe, we find no support for this. When we test 
whether the ~45,000-year-old Ust’-Ishim—an early Eurasian without 
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(green); the GoyetQ116-1 lineage as the predominant contributor to the 

El Miron Cluster (red); and the Villabruna lineage as broadly represented 
across many clusters. b, Drawing together of European and Near Eastern 
populations ~14,000 years ago. Plot of affinity of each pre-Neolithic 
European population X to non-Africans outside Europe Y moving forward 
in time, comparing to Kostenki14 as a baseline; values Z < —3 standard 
errors below zero are indicated with filled symbols (we restricted to 
individuals with >50,000 SNPs). We observe an affinity to Near Easterners 
beginning with the Villabruna Cluster, and another to east Asians that 
affects a subset of the Villabruna Cluster. 


any evidence of Basal Eurasian ancestry—shares more alleles with one 
test individual or another by computing statistics of the form D(Test, 
Test2; Ust’-Ishim, Mbuti), we find that the statistic is consistent with 
zero when the Test populations are any pre-Neolithic Europeans or 
present-day east Asians*!*. This would not be expected if some of the 
pre-Neolithic Europeans, including Kostenkil4, had Basal Eurasian 
ancestry (Supplementary Information section 8). We also find no 
evidence for the suggestion that the Mal’ta1 lineage contributed to Upper 
Palaeolithic Europeans‘, because when we compute the statistic D(Test, 
Tests; Mal’tal, Mbuti), we find that the statistic is indistinguishable 
from zero when the Test populations are any pre-Neolithic Europeans 
beginning with Kostenki14, consistent with descent from a single 
founder population since separation from the lineage leading to Mal’ta1 
(Supplementary Information section 9). A corollary of this finding is 
that the widespread presence of Mal’ta1-related ancestry in present- 
day Europeans! is probably explained by migrations from the Eurasian 
steppe in the Neolithic and Bronze Age periods’. 


Resurfacing of a European lineage in the Glacial Maximum 
Among the newly reported individuals, GoyetQ116-1 from present- 
day Belgium is the oldest at ~35,000 years ago. This individual is 
similar to the ~37,000-year-old Kostenkil4 and all later individuals in 
that it shares more alleles with present-day Europeans (for example, 
French) than with east Asians (for example, Han). In contrast, Ust’-Ishim 
and Oase1, which predate GoyetQ116-1 and Kostenki14, do not show 
any distinctive affinity to later Europeans (Extended Data Table 6). 
Thus, from about 37,000 years ago, populations in Europe shared at 
least some ancestry with present Europeans. However, GoyetQ116-1 
differs from Kostenki14 and from all individuals of the succeeding 
Véstonice Cluster in that both f5-statistics (Fig. 3; Extended Data Fig. 2) 
and D-statistics show that it shares more alleles with members 
of the El Mirén Cluster who lived 19,000-14,000 years ago than 
with other pre-Neolithic Europeans (Supplementary Information 
section 10). Thus, GoyetQ116-1 has an affinity to individuals who lived 
more than 15,000 years later. While at least half of the ancestry of all 
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El Miron Cluster individuals comes from the lineage represented by 
GoyetQ116-1, this proportion varies among individuals with the largest 
amount found outside Iberia (Z = —4.8) (Supplementary Information 
section 10). 


Europe and the Near East drew together around 14 ka 
Beginning around 14,000 years ago with the Villabruna Cluster, the 
strong affinity to GoyetQ116-1 seen in El Mirén Cluster individu- 
als who belong to the Late Glacial Magdalenian culture becomes 
greatly attenuated (Supplementary Information section 10). To test 
if this change might reflect gene flow from populations that did not 
descend from the >37,000-year-old European founder population, 
we computed statistics of the form D(Early European, Later European; 
Y, Mbuti) where Y are various present-day non-Africans. If no gene 
flow from exogenous populations occurred, this statistic is expected 
to be zero. Figure 4b shows that it is consistent with zero (|Z| <3) 
for nearly all individuals dating to between about 37,000 and 14,000 
years ago. However, beginning with the Villabruna Cluster, it becomes 
highly significantly negative in comparisons where the non-Euro- 
pean population (Y) is Near Easterners (Fig. 4b; Extended Data Fig. 3; 
Supplementary Information section 11). This must reflect a contribu- 
tion to the Villabruna Cluster from a lineage also found in present-day 
Near Easterners (Fig. 4b). 

The Satsurblia Cluster individuals from the Caucasus dating to 
~13,000-10,000 years ago” share more alleles with the Villabruna 
Cluster individuals than they do with earlier Europeans, indicating that 
they are related to the population that contributed new alleles to people 
in the Villabruna Cluster, although they cannot be the direct source of 
the gene flow. One reason for this is that the Satsurblia Cluster carries 
large amounts of Basal Eurasian ancestry while Villabruna Cluster indi- 
viduals do not? (Supplementary Information section 12; Extended Data 
Fig. 4). One possible explanation for the sudden drawing together of 
the ancestry of Europe and the Near East at this time is long-distance 
migrations from the Near East into Europe. However, a plausible alter- 
native is population structure, whereby Upper Palaeolithic Europe har- 
boured multiple groups that differed in their relationship to the Near 
East, with the balance shifting among groups as a result of demographic 
changes after the Glacial Maximum. 

The Villabruna Cluster is represented by the largest number of indi- 
viduals in this study. This allows us to study heterogeneity within this 
cluster (Supplementary Information section 13). First, we detect dif- 
ferences in the degree of allele sharing with members of the El Miron 
Cluster, as revealed by significant statistics of the form D(Test), Test; 
EI Mir6én Cluster, Mbuti). Second, we detect an excess of allele shar- 
ing with east Asians in a subset of Villabruna Cluster individuals— 
beginning with an ~13,000-year-old individual from Switzerland—as 
revealed by significant statistics of the form D(Test), Testa; Han, Mbuti) 
(Fig. 4b and Extended Data Fig. 3). For example, Han Chinese share 
more alleles with two Villabruna Cluster individuals (Loschbour and 
LaBrana1) than they do with Kostenki14, as reflected in significantly 
negative statistics of the form D(Kostenki14, Loschbour/LaBrana1; Han, 
Mbuti)*. This statistic was originally interpreted as evidence of Basal 
Eurasian ancestry in Kostenki14. However, because this statistic is con- 
sistent with zero when Han is replaced with Ust’-Ishim, these findings 
cannot be driven by Basal Eurasian ancestry (as we discuss earlier), 
and must instead be driven by gene flow between populations related 
to east Asians and the ancestors of some Europeans (Supplementary 
Information section 8). 


Conclusions 

We show that the population history of pre-Neolithic Europe was 
complex in several respects. First, at least some of the initial modern 
humans to appear in Eurasia, exemplified by Ust’-Ishim and Oase1, 
failed to contribute appreciably to the current European gene pool*)’. 
Only from around 37,000 years ago do all the European individuals 
analysed share ancestry with present-day Europeans. Second, from 
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the time of Kostenkil4 about 37,000 years ago until the time of the 
Villabruna Cluster about 14,000 years ago, all individuals seem to 
derive from a single ancestral population with no evidence of sub- 
stantial genetic influx from elsewhere. It is interesting that during this 
time, the Mal’ta Cluster is not represented in any of the individuals 
we sampled from Europe. Thus, while individuals assigned to the 
Gravettian cultural complex in Europe are associated with the Véstonice 
Cluster, there is no genetic connection between them and the Mal'tal 
individual in Siberia, despite the fact that Venus figurines are associated 
with both. This suggests that if this similarity is not a coincidence*!, 
it reflects diffusion of ideas rather than movements of people. Third, 
we find that GoyetQ116-1 derives from a different deep branch of the 
European founder population than the Véstonice Cluster which became 
predominant in many places in Europe between 34,000 and 26,000 
years ago including at Goyet. GoyetQ116-1 is chronologically associated 
with the Aurignacian cultural complex. Thus, the subsequent spread 
of the Véstonice Cluster shows that the diffusion of the Gravettian cul- 
tural complex was mediated at least in part by population movements. 
Fourth, the population represented by GoyetQ116-1 did not disappear, 
as its descendants became widespread again after ~19,000 years ago 
in the El Mirén Cluster when we detect them in Iberia. The El Mirén 
Cluster is associated with the Magdalenian culture and may represent 
a post-Glacial Maximum expansion from southwestern European ref- 
ugia®’. Fifth, beginning with the Villabruna Cluster at least ~14,000 
years ago, all European individuals analysed show an affinity to the 
Near East. This correlates in time to the Bolling-Allerod interstadial, 
the first significant warming period after the Glacial Maximum”. 
Archaeologically, it correlates with cultural transitions within the 
Epigravettian in southern Europe* and the Magdalenian-to-Azilian 
transition in western Europe*». Thus, the appearance of the Villabruna 
Cluster may reflect migrations or population shifts within Europe at 
the end of the Ice Age, an observation that is also consistent with the 
evidence of mitochondrial DNA turnover”***, One scenario that could 
explain these patterns is a population expansion from southeastern 
European or west Asian refugia after the Glacial Maximum, drawing 
together the genetic ancestry of Europe and the Near East. Sixth, within 
the Villabruna Cluster, some, but not all, individuals have an affinity to 
east Asians. An important direction for future work will be to generate 
similar ancient DNA data from southeastern Europe and the Near East 
to arrive at a more complete picture of the Upper Palaeolithic popula- 
tion history of western Eurasia. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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Extended Data Figure 1 | A decrease in Neanderthal ancestry in the last _fit excludes Oasel (as an outlier with recent Neanderthal ancestry) and 
45,000 years. This is similar to Fig. 2, except we use ancestry estimates Europeans (known to have reduced Neanderthal ancestry). The regression 
from rates of alleles matching to Neanderthal rather than f,-ratios, as slope is significantly negative (P= 0.00004, Extended Data Table 3). 
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European populations to pairs of European hunter-gatherers changes 
over time. Statistics were examined of the form D(W, X; Y, Mbuti), with 
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gatherer, X is another European hunter-gatherer (in chronological order 
on the x axis), and Y is a non-European population (see legend). 
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Extended Data Figure 4 | Three admixture graph models that fit allele frequency correlation data among Mbuti, Ust’-Ishim, Kostenkil4, 
the data for Satsurblia, an Upper Palaeolithic individual from the Vestonice16, Maltal, ElMiron and Satsurblia to within the limits of our 
Caucasus. These models use 127,057 SNPs covered in all populations. resolution, in the sense that all empirical f,-, f;- and f,-statistics relating 
Estimated genetic drifts are given along the solid lines in units of fp- the individuals are within three standard errors of the expectation of the 
distance (parts per thousand), and estimated mixture proportions model. Models in which Satsurblia is treated as unadmixed cannot be fit. 


are given along the dotted lines. All three models provide a fit to the 
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Extended Data Table 1 | The 51 ancient modern humans analysed in this study 


Sample Code ee Country Lat. Long. Sou Date type (ref.) Culture 
Ustlshim 2 Russia 5743 71.10 47.480-42,560 —_Direct-UF ("") Unassigned 
Oasel : Romania 45.12 21.90 41,640-37,580 _Direct-UF (””) Unassigned 
Kostenkil4” New Russia 51.23 39.30 38,680-36,260 ——Direct-UF (**) Unassigned 
GoyetQ116-1 New Belgium 50.26 4.28 — 35,160-34,430Direct-NotUF() —Aurignacian 
Muierii2 New Romania 45.11 23.46 — 33,760-32.840 _Direct-UF (*) Unassigned 
Pagliccil33 New Italy 41.65 1561 —34,580-31,210 Layer (“°) Gravettian 
Cioclovinal New Romania 45.35 23.84 — 33,090-31,780—_—_—Diirect-UF (“) Unassigned 
Kostenkil2 New Russia 51.23 39,30 —32,990-31,840 Layer (”) Unassigned 
KremsWA3 New Austria 4841 15.59 31,250-30,690 Layer (**) Gravettian 
Vestonice13 New Czech Republic 48.53 16.39 31,070-30,670 Layer (““) Gravettian 
Vestonice15 New Czech Republic 48.53 16.39 31,070-30,670 Layer (“) Gravettian 
Vestonicel4 New Czech Republic 48.53 16.39 —-31,070-30,670 Layer (“*) Gravettian 
Pavlov1 New Czech Republic 48.53 16.39 —-31,110-29,410 Layer (“*) Gravettian 
Vestoniced3 New Czech Republic 48.53 16.39 —-30,710-29,310 Layer (“*) Gravettian 
Vestonice16 New Czech Republic 48.53 16.39 -—-30,710-29,310 Layer (““) Gravettian 
Ostuni2 New Italy 40.73 17.57 29,310-28,640 _Direct-UF (New) Gravettian 
GoyetQ53-1 New Belgium 50.26 4.28 —-28,230-27,720 —_Direct-NotUF (*) Gravettian 
Paglicci108 New Italy 41.65 15.61 28,430-27,070 Layer (“") Gravettian 
Ostunil New Italy 40.73. 17.57 27,810-27.430 _Direct-UF (New) Gravettian 
GoyetQ376-19 New Belgium 50.26 4.28 —-27,720-27,310 _Direct-NotUF (*) Gravettian 
GoyetQ56-16 New Belgium 50.26 4.28 ~—-26,600-26,040 __Direct-NotUF (*) Gravettian 
Maltal ~ Russia 529 103.5  24,520-24,090 _Direct-UF (‘*) Unassigned 
EIMiron New Spain 43.26 -345 —-18,830-18,610 _Direct-UF (*) Magdalenian 
AfontovaGora3 New Russia 56.05 92.87 16,930-16,490 Layer (*) Unassigned 
AfontovaGora2 is Russia 56.05 92.87 16930-16490 _Direct-UF ("“) Unassigned 
Rigney1 New France 47.23 6.10 —‘15,690-15,240 Direct-NotUF (“*) — Magdalenian 
HohleFels49 New Germany 48.22 945 — 16,000-14,260 Layer ("”) Magdalenian 
GoyetQ-2 New Belgium 50.26 4.28 —-15,230-14,780 _Direct-NotUF(*) — Magdalenian 
Brillenhohle New Germany 48.24 9.46 —15,120-14,440 _Direct-UF (*“) Magdalenian 
HohleFels79 New Germany 48.22 945 — 15,070-14.270 _—Direct-UF (*) Magdalenian 
Burkhardtshohle New Germany 48.32 9.35 —-15,080-14,150 —Direct-NotUF(“) — Magdalenian 
Villabruna New Italy 46.15 12.21 14,180-13.780 _Direct-UF (*) Epigravettian 
Bichon 2 Switzerland 47.01. 6.79 -—‘13,770-13,560 Direct-UF () Azilian 
Satsurblia 2 Georgia 42.24 42.92 13,380-13,130 Direct-UF () Epigravettian 
Rochedane New France 47.21 645 — 13,090-12,830 Direct-NotUF () _Epipaleolithic 
Iboussieres39 New France 44.29 4.46 — 12,040-11,410 Direct-NotUF (°°) _Epipaleolithic 
Continenza New Italy 41.96 13.54 11,200-10,510 Layer (New) Mesolithic 
Ranchot88 New France 4791 543 — 10,240-9,930 _Direct-NotUF (*) Mesolithic 
LesCloseaux 13 New France 4852 211 10240-9560 Direct-NotUF (”) Mesolithic 
Kotias 2 Georgia 42.13 43.12 9,890-9,550 Direct-UF (*) Mesolithic 
Falkenstein New Germany 48.06 9.04 9410-8990 _Direct-NotUF (*) Mesolithic 
Karelia i Russia 61.65 35.65 8,800-7,950 Layer (“) Mesolithic 
Bockstein New Germany 48.33 10.09 —-8,370-8,160 Layer (*) Mesolithic 
Ofnet New Germany 48.49 10.27 —-8,430-8,060 Layer (*) Mesolithic 
Chaudardes1 New France 49.24 3.46 8,360-8,050 _Direct-NotUF (**) Mesolithic 
Loschbour s Luxembourg 49.70 6.24 8,160-7,940 Direct-UF ("*) Mesolithic 
LaBranal 8 Spain 42.93 -5.35 7,940-7,690 Direct-UF (*’) Mesolithic 
Hungarian.KO1 iu Hungarian 4793 21.20 —_7,730-7.590 Direct-UF ("") Neolithic 
Motalal2 : Sweden 58.54 15.05 7,670-7,580 _Direct-UF (New) Mesolithic 
BerryAuBac New France 49.24 3.54 7320-7170 Direct-NotUF (**) Mesolithic 
Stuttgart : Germany 48.78 9.18 7,260-7,020 __Direct-UF (New) _Early Neolithic 


. mtDNA Y chrom. Genetic Damage Mean SNPs 
Remain SNP Panel Sex . 
haplogroup _haplogroup Cluster restrict coveraget+ covered 
Femur Shotgun M R K(xLT) Unassigned No 42 2,137,615 
Mandible Shotgun M N B Unassigned Yes 0.156 285,076 
Tibia 3.7™ M U2 Clb Unassigned No 16.1 1,774,156 
Humerus 1240k M M Cla Unassigned No 1.046 846,983 
Temporal 3.7M F U6 Unassigned Yes 0.049 98,618 
Tooth 1240k M U8e I Véstonice No 0.041 82,330 
Cranium 1240k M U CT Unassigned Yes 0.006 12,784 
Cranium 3.7M M U2 ‘cr Unassigned No 0.03 61,228 
Cranium 1240K M US Véstonice No O11 203,986 
Femur 3.7M M U8e CT(notIJK) Véstonice Yes 0.071 139,568 
Femur 3.7M M US BT Véstonice Yes 0.015 30,900 
Femur 390k M U Véstonice Yes 0.003 5,677 
Femur 3.7M M US Cla2 Véstonice Yes 0.028 57,005 
Femur 3.™ M U F Véstonice Yes 0.087 163,946 
Femur 3.7™M M US UK Véstonice No 1.31 945,292 
Femur 3.7M F U2 Véstonice: Yes 0.008 17,017 
Fibula 1240k F U2 Véstonice Yes 0.006 12,567 
Phalanx 1240k F U2'3'4'7'8'9 Véstonice Yes 0.002 4,330 
Tibia 3.7M F M Véstonice Yes 0.245 369,313 
Humerus 1240k F U2 Véstonice Yes 0.012 25,400 
Fibula 1240k F U2 Véstonice Yes 0.005 9,988 
Humerus Shotgun M U R Mal’ta No 1.174 1439501 
Toe 3.7M. F USb El Mirén Yes 1.012 797,714 
Tooth 3.7™M F Rib Mal’ta Yes 0.17 286,355 
Humerus Shotgun M Mal’ta No 0.071 143,751 
Mandible 1240k F U2'3'4'7'8'9 El Mirén Yes 0.017 35,600 
Femur 390k M U8a I El Mirén Yes 0.033 63,151 
Humerus 1240k M U8a HUK El Mirén Yes 0.035 72,263 
Cranium 390k M U8a El Mirén Yes 0.006 13,459 
Cranium 390k M U8a EI Mirén Yes 0.005 11,211 
Cranium 1240k M U8a T El Mirén Yes 0.018 38,376 
Femur 3.™M M USb2b RIbl Villabruna No 3.137 1,215,433 
Petrous Shotgun M USbih 2 Villabruna No 8.119 2,116,782 
Petrous Shotgun M K3 2 Satsurblia No 1.195 1,460,368 
Mandible 1240k M USb2b I Villabruna No 0.131 237,390 
Femur 390k M USb2b Villabruna ‘Yes 0.005 9,659 
Cranium 3.7™M F USb1 Villabruna Yes 0.006 11,717 
Cranium 1240k F USb1 Villabruna Yes 0.322 414,863 
Femur 1240k F US5Sa2 Villabruna Yes 0.004 8,635 
Tooth Shotgun M H13c J Satsurblia No 12.157 2,133,968 
Fibula 390k M USa2c F Villabruna Yes 0.033 64,428 
Tooth Shotgun M Clg Rlal Unassigned No 1.952 1,754,410 
Tooth 390k F USbld1 Villabruna Yes 0.011 21,977 
Tooth 390k F USbid1 Villabruna Yes 0.003 6,263 
Tibia 1240k M USb1b IT Villabruna Yes 0.046 92,657 
Tooth Shotgun M USbla I2alb Villabruna No 20 2,091,584 
Tooth Shotgun M USb2c1 Cla2 Villabruna No 3.338 1,884,745 
Petrous Shotgun M R3 Ta Villabruna No Ld 1,410,303 
Tooth Shotgun M U2el Talb* Unassigned No 2.185 1,874,519 
Radius 1240k M USbla IT Villabruna No 0.027 54,690 
Tooth Shotgun F T2eld1 Unassigned No 19 2,078,724 


Refs 37-57 are cited in this table. All dates are obtained as described in Supplementary Information section 1. When an individual has a direct date from the same skeleton it is marked ‘direct’ 

followed by a hyphen to indicate whether the date is obtained by ultrafiltration (‘UF’) or without (‘NotUF’). If the date is from the archaeological layer, the date type is marked as ‘layer’. All the dates are 
calibrated using IntCal13 (ref. 58) and the OxCal4.2 program®®. 
*Kostenki14 is represented in most analyses by our newly reported 16.1 x capture data, but key analyses were repeated on the previously reported 2.8x shotgun data’. 
+Mean coverage is computed on the 3.7 million SNP targets. 
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Extended Data Table 2 | Estimated proportion of Neanderthal ancestry 


Sfrratios Archaic Ancestry Informative SNPs 
Age Increase in 
Sample Code BP SNPs Est. 95% CI SNPs Est. 95% CI Neanderthal S.E. 
ancestry with B 
UstIshim 45,020 2,137,615 4.4% 3.6% - 5.3% 778,774 3.0% 2.3% - 3.7% -0.9% 1.3% 
Oasel 39,610 285,076 9.9% 8.4% - 11.4% 59,854 75% 6.0% - 89% 2.5% 1.8% 
Kostenkil4 37,470 | 1,774,156 3.6% 2.7% - 44% 632,748 2.8% 23% - 3.3% -1.0% 1.0% 
GoyetQ116-1 34,795 846,983 3.4% 24% - 43% 
Muierii2 33,300 98,618 5.2% 3.0% - 7.4% 22,189 3.0% 25% - 35% 0.6% 11% 
Pagliccil33 32,895 82,330 4.1% 2.1%  - 6.0% 
Cioclovinal 32,435 12,784 4.1% -1L1%  - 9.3% 
Kostenkil2 32,415 61,228 1.9% -0.7% - 44% 13,385 2.6% 21% - 3.2% 1.7% 1.5% 
KremsWA3 30,970 203,986 3.9% 2.6% - 5.2% - 
Vestonice13 30,870 139,568 4.6% 2.6% - 65% 35,983 3.3% 27% - 38% 0.3% 13% 
Vestonice15 30,870 30,900 4.3% 0.6% - 7.9% 5,855 2.7% 2.1% - 34% -15% 13% 
Vestonice 14 30,870 5,677 2.6% 5.9%  - 11.0% 
Pavlovl 30,260 57,005 4.4% 1.6% - 7.1% 9,327 3.1% 25% - 38% 0.7% 1.2% 
Vestonice43 30,010 163,946 6.9% 5.2% - 8.5% 38,749 2.9% 24% - 33% 0.9% 0.9% 
Vestonice16 30,010 945,292 4.1% 3.1% - 5.1% 268,157 2.8% 2.3% - 3.3% -0.1% 1.0% 
Ostuni2 28,975 17,017 1.6% 3.2% - 63% 2,746 2.3% 14% - 3.1% 1.3% 1.6% 
GoyetQ53-1 27,975 12,567 4.8% -0.7% - 10.3% 
Pagliccil08 27,750 4,330 3.4% -6.0% - 12.7% 
Ostunil 27,620 369,313 4.2% 3.0% - 5.4% 88,449 2.6% 2.2% - 30% 0.1% 0.9% 
GoyetQ376-19 27,518 25,400 6.5% 2.7%  - 10.2% 
GoyetQ56-16 26,320 9,988 3.6% 19% - 91% 
Malta1 24,305 1,439,501 2.9% 19% - 3.8% 437,187 2.5% 2.1% - 29% 1.0% 0.8% 
ElMiron 18,720 797,714 3.6% 2.6% - 45% 250,071 2.8% 25% - 3.2% 0.6% 0.9% 
AfontovaGora3 16,710 286,355 3.0% 18% - 4.2% 96,237 3.3% 2.9% - 3.7% -1.5% 10% 
AfontovaGora2 16,710 143,751 2.2% 04% - 40% 37,280 2.3% 19% - 2.7% -0.3% 0.9% 
Rigneyl 15,465 35,600 0.8% 2.6% - 4.2% 
HohleFels49 15,130 63,151 2.3% 0.6% - 5.2% 
GoyetQ-2 15,005 72,263 1.7% 0.6% - 40% 
Brillenhohle 14780 13,459 2.5% 3.0%  - 8.1% 
HohleFels79 14,670 11,211 1.7% 5.1%  - 8.5% 
Burkhardtshohle 14,615 38,376 1.7% -1.6%  - 5.0% 
Villabruna 13,980 1,215,433 2.7% 1.8% - 3.5% 425,148 3.3% 30% - 3.7% 1.1% 0.9% 
Bichon 13,665 | 2,116,782 2.9% 19% - 3.8% 769,422 2.7% 2.2% - 3.2% 0.7% 1.3% 
Satsurblia 13,255 1,460,368 1.5% 0.6% - 2.4% 542,561 2.0% 17% - 24% 0.9% 0.6% 
Rochedane 12,960 237,390 1.9% 0.5%  - 3.3% 
Tboussieres39 11,725 9,659 6.4% 0.8% - 13.7% 
Continenza 10,855 11,717 4.1% -14%  - 9.6% 1,733 2.9% 18% - 40% -10.6% 44% 
Ranchot88 10,085 414,863 2.9% 1.8% - 4.0% 
LesCloseaux13 9,900 8,635 = 9.7%  - 3.8% 
Kotias 9,720 | 2,133,968 18% 10% - 2.7% | 779146 2.1% 18% - 24% 0.7% 0.5% 
Falkenstein 9,200 64,428 4.8% 1.7% - 7.8% 
Karelia 8,375 1,754,410 1.9% 11% - 2.7% 582,444 2.2% 19% - 26% -0.2% 0.7% 
Bockstein 8,265 21,977 5.7% 10% - 10.5% 
Ofnet 8,245 6,263 9.8% 14% - 18.1% 
Chaudardes1 8,205 92,657 1.9% 0.2% - 3.9% 
Loschbour 8,050 2,091,584 2.5% 16% - 3.3% 774,139 2.6% 20% - 31% 2.7% 1.7% 
LaBranal 7,815 1,884,745 1.9% 1.1%  - 2.8% 642,231 2.7% 23% - 3.2% 04% 0.8% 
Hungarian.KO1 7,660 1,410,303 2.1% 1.2% - 3.0% 439 408 24% 20% - 28% -0.1% 12% 
Motalal2 7,625 1,874,519 2.5% 16% - 3.3% 655,685 2.3% 19% - 2.7% -0.1% 0.7% 
BerryAuBac 7,245 54,690 2.5% -0.2% - 5.1% 
Stuttgart 7,140 2,078,724 1.9% 11% - 2.7% 767,813 2.1% 18% - 25% 0.0% 0.7% 
Dai 0 2,144,502 14% 0.7% - 2.1% 782,066 1.8% 15% - 21% 14% 0.4% 
Han 0 2,144,502 1.8% 1.1% - 2.5% 782,164 2.1% 18% - 25% 1.9% 0.7% 
English 0 2,144,502 1.5% 0.8% - 2.2% 
French 0 2,144,502 1.5% 0.9% - 21% 782,386 1.7% 14% - 1.9% 1.4% 0.6% 
Sardinian 0 2,144,502 1.2% 0.6% ~ - 1.9% 782,351 1.7% 14% - 20% 0.7% 0.5% 
Karitiana 0 782,037 2.1% 17% - 24% 1.5% 1.0% 
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Extended Data Table 3 | Significant correlation of Neanderthal ancestry estimate with specimen age 


P-value for Decrease in Estimate of Neanderthal ancestry at different time points 
ee Oe ee ee 
f,ratio estimates 
Core Set | (all ancient samples (except Oase/) + Han + Dai) 57 5x 107 0.48-0.73% 1.1-2.2% 4.0-5.4% 4.3-5.7% 4.5-6.0% 
Subset of Core Set 1 (<32kya) 50 2x 10% 0.59-0.98% 0.9-2.1% 4.5-6.4% 4.8-6.9% 5.1-7.4% 
Subset of Core Set 1 (>32kya or <25kya) 44 4x 108 0.44-0.69% 1.0-2.2% 3.7-5.2% 4.0-5.5% 4.2-5.8% 
Subset of Core Set 1 (>25kya or <14kya) 47 5x 107 0.48-0.73% 1.0-2.2% 3.9-5.3% 4.2-5.7% 4.5-6.0% 
Subset of Core Set 1 (>14kya or present day) 37 2x 10" 0.47-0.74% 1,1-2.4% 4.1-5.5% 4.3-5.8% 4.6-6.2% 
Subset of Core Set 1 (only ancient samples) 50 4x 10% 0.46-0.76% 1,0-2.3% 4.0-5.4% 4.3-5.8% 4.5-6.1% 
Subset of Core Set 1 (individuals with >200,000 SNPs) 28 4x10 0.46-0.71% 1.1-2.3% 3.9-5.3% 4.2-5.7% 4.4-6.0% 
Modification of Core Set 1 (replace East Asians with Europeans) 58 2x 107 0.49-0.73% 1,1-2.3% 4.0-5.4% 4.3-5.8% 4.6-6.1% 
All ancient samples including Oasel + Han + Dai 58 8x 10” 0.57-0.81% 1.0-2.2% 4.3-5.7% 4.7-6.1% 5.0-6.5% 
All ancient samples 51 1x10” 0.57-0.86% 0.9-2.2% 4.4-5.8% 4.7-6.2% 5.0-6.6% 
All ancient samples except Oase/ or UstIshim 49 8x 10 0.45-0.81% 1.0-2.3% 4.0-5.6% 4.2-6.0% 4.5-6.4% 
Ancestry informative SNPs 
Core Set 2 (all ancient samples (except Oasel) + Han + Dai + Karitiana) 29 4x 101! 0.21-0.39% 1.8-2.3% 3.1-4.0% 3.2-4.2% 3.3-4.3% 
Subset of Core Set 2 (no Han, Dai, Karitiana, Stuttgart) 25 1x 107 0.11-0.36% 1.8-2.5% 2.9-3.8% 3.0-4.0% 3.0-4.1% 
Subset of Core Set 2 (no Han, Dai, Karitiana, Stuttgart, UstIshim) 24 2x 107 0.11-0.37% 1.8-2.5% 2.9-3.8% 2.9-4.0% 3.0-4.2% 


‘Core set 1’ used for the f4-ratio analyses, refers to 50 ancient individuals (removing Oase1 as an outlier) along with 7 east Asians (Dai and Han). ‘Core set 2’ used for the analyses of Neanderthal 
ancestry informative SNPs, refers to 26 ancient individuals (removing Oase1, Han, Dai and Karitiana). 
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Extended Data Table 4 | Sex determination for newly reported individuals 


Sample Target Type Nauto Nx Ny Nx/Nauto  Ny/Nauto | X-rate Y-rate | Sex 
1240k or 2.2M | 1151240 49711 32681 0.0432 0.0284 
390k | 388745 1819 2242 0.0047 0.0058 
Kostenkil4 2.2M all 29633405 395534 262846 | 0.0133 0.0089 0.309 0.312 | M 
GoyetQ116-1 1240k all 2122620 36391 22256 0.0171 0.0105 0.397 0.369 | M 
Cioclovinal 1240k Damage | 11521 184 125 0.0160 0.0108 0.370 0.382 | M 
Kostenkil2 2.2M Subset 63908 856 504 0.0134 0.0079 0.310 0.278 | M 
Muierii2 2.2M Damage | 81165 2177 8 0.0268 0.0001 0.621 0.003 | F 
Vestonice13 2.2M Damage | 119094 1578 1059 0.0133 0.0089 0.307 0.313 | M 
Vestonice15 2.2M Damage | 28762 338 227 0.0118 0.0079 0.272 0.278 | M 
Vestonice14 390k Damage | 4846 8 11 0.0017 0.0023 0.353 0.394 | M 
Vestonice43 2.2M Damage | 136933 1826 1204 0.0133 0.0088 0.309 0.310 | M 
Pavlov! 2.2M Damage | 54429 631 404 0.0116 0.0074 0.268 0.261 M 
Vestonice16 2.2M Subset 2433741 30463 20976 0.0125 0.0086 0.290 0.304 | M 
KremsWA3 1240k all 235069 4119 2661 0.0175 0.0113 0.406 0.399 | M 
Ostuni2 2.2M Damage | 15749 138 1 0.0088 0.0001 0.203 0.002 | F 
Ostunil 2.2M Damage | 427199 10868 47 0.0254 0.0001 0.589 0.004 | F 
Pagliccil08 1240k Damage | 3883 124 0.0319 0.0005 0.740 0.018 | F 
GoyetQ53-1 1240k Damage | 10771 311 0.0289 0.0004 0.669 0.013 | F 
GoyetQ376-19 1240k Damage | 20052 680 10 0.0339 0.0005 0.785 0.018 | F 
GoyetQ56-16 1240k Damage | 8702 304 7 0.0349 0.0008 0.809 0.028 | F 
Pagliccil33 1240k — Subset 81092 1641 983 0.0202 0.0121 0.469 0.427 | M 
ElMiron 2.2M Damage | 1765696 40647 196 0.0230 0.0001 0.533 0.004 | F 
HohleFels79 390k Damage | 10188 28 22 0.0027 0.0022 0.587 0.374 | M 
AfontovaGora3 2.2M Damage | 291798 8705 37 0.0298 0.0001 0.691 0.004 | F 
HohleFels49 390k Damage | 61051 113 111 0.0019 0.0018 0.396 0.315 | M 
Rigney1 1240k Damage | 32797 1131 9 0.0345 0.0003 0.799 0.010 | F 
GoyetQ-2 1240k Damage | 65563 1123 706 0.0171 0.0108 0.397 0.379 | M 
Brillenhohle 390k Damage | 12603 22 22 0.0017 0.0017 0.373 0.303 | M 
Burkhardtshohle 1240k Damage | 34207 563 407 0.0165 0.0119 0.381 0419 | M 
Villabruna 2.2M Subset 5505838 72055 52110 0.0131 0.0095 0.303 0.333 | M 
Rochedane 1240k — Subset 256325 4780 2830 0.0186 0.0110 0.432 0.389 | M 
Continenza 2.2M Damage | 10647 208 2 0.0195 0.0002 0.452 0.007 | F 
Iboussieres39 390k Damage | 8246 12 22 0.0015 0.0027 0.311 0.463 | M 
Ranchot88 1240k Damage | 594962 18520 119 0.0311 0.0002 0.721 0.007 | F 
LesCloseaux 13 1240k Damage | 7326 275 2 0.0375 0.0003 0.869 0.010 | F 
Falkenstein 390k Damage | 58970 113 102 0.0019 0.0017 0.410 0.300 | M 
Bockstein 390k Damage | 20214 62 0 0.0031 0.0000 0.655 0.000 | F 
Ofnet 390k Damage | 5294 13 1 0.0025 0.0002 0.525 0.033 | F 
Chaudardes 1 1240k Damage | 84052 1429 865 0.0170 0.0103 0.394 0.363 | M 
BerryAuBac 1240k = All 49670 902 554 0.0182 0.0112 0.421 0.393 | M 


*We restrict analysis to the 1240k target set for study of the 2.2M capture datasets. 


Y-rate is the ratio of Ny/Nauto divided by the same quantity for the genome-wide target set. Female sex (F) is inferred as Y-rate <0.05 and male sex (M) as Y-rate >0. 
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Extended Data Table 5 | Allele counts at SNPs affected by selection in individuals with >1-fold coverage 


LCT SLC45A2 SLC24A5 EDAR HERC2 
SNP 184988235 rs 16891982 181426654 —_rs3827760 1s12913832 
Ancestral G A 
Derived A G 
UstIshim Coverage 31 46 52 42 50 
Derived allele frequency 0% 0% 2% 0% 0% 
Kostenkil4 Coverage 140 113 6 45 52 
Derived allele frequency 0% 2% 17% 0% 0% 
GoyetQ116-1 Coverage 8 6 0 9 1 
Derived allele frequency 0% 0% n/a 0% 0% 
Vestonicel 6 Coverage 13 18 0 4 5) 
Derived allele frequency 0% 6% 0% 0% 
Maltal Coverage 1 0 2 2 2 
Derived allele frequency 0% 0% 0% 0% 
EIMiron Coverage 2 10 0 7 5 
Derived allele frequency 0% 0% 0% 0% 
Villabruna Coverage 17 52 5 19 10 
Derived allele frequency 0% 0% 0% 0% 100% 
Bichon Coverage 11 4 25 16 9 
Derived allele frequency 0% 0% 0% 0% 33% 
Satsurblia Coverage 1 2 4 1 4 
Derived allele frequency 0% 0% 100% 0% 50% 
Kotias Coverage 16 22 13 20 15 
Derived allele frequency 0% 0% 100% 0% 0% 
Loschbour Coverage 19 18 20 17 21 
Derived allele frequency 0% 0% 0% 0% 100% 
LaBranal Coverage 8 6 2 11 3 
Derived allele frequency 12% 0% 0% 0% 100% 
Hungarian.KO1 Coverage 1 2 2 1 2 
Derived allele frequency 0% 0% 50% 0% 100% 
Motalal2 Coverage 2 0 3 3 1 
Derived allele frequency 0% 0% 33% 100% 
Karelia Coverage 1 9 4 0 1 
Derived allele frequency 0% 67% 0% 0% 
Stuttgart Coverage 25 21 15 29 21 
Derived allele frequency 0% 0% 100% 0% 0% 


rs4988235 is responsible for lactase persistence in Europe®®!. The SNPs at SLC24A5 and SLC45A2 are responsible for light skin pigmentation. The SNP at EDAR®*.®3 affects tooth morphology and 
hair thickness. The SNP at HERC2 (refs 64, 65) is the primary determinant of light eye colour in present-day Europeans. We present the fraction of fragments overlapping each SNP that are derived; 
the observation of a low rate of derived alleles does not prove that the individual carried the allele, and instead may reflect sequencing error or ancient DNA damage. Sites highlighted in light grey 
were judged (based on the derived allele count) likely to be heterozygous for the derived allele, and dark grey sites are likely to be homozygous. 


60. Enattah, N. S. et a/. Identification of a variant associated with adult-type 
hypolactasia. Nature Genet. 30, 233-237 (2002). 

. Bersaglieri, T. et al. Genetic signatures of strong recent positive 
selection at the lactase gene. Am. J. Hum. Genet. 74, 1111-1120 
(2004). 

. Fujimoto, A. et al. Ascan for genetic determinants of human hair morphology: 
EDAR is associated with Asian hair thickness. Hum. Mol. Genet. 17, 835-843 
(2008). 
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63. Kimura, R. et al. A common variation in EDAR is a genetic determinant of 
shovel-shaped incisors. Am. J. Hum. Genet. 85, 528-535 (2009). 

64. Sturm, R. A. et al. A single SNP in an evolutionary conserved region within 
intron 86 of the HERC2 gene determines human blue-brown eye color. 
Am. J. Hum. Genet. 82, 424-431 (2008). 

65. Eiberg, H. et a/. Blue eye color in humans may be caused by a perfectly 
associated founder mutation in a regulatory element located within the 
HERC2 gene inhibiting OCA2 expression. Hum. Genet. 123, 177-187 (2008). 
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Extended Data Table 6 | All European hunter-gatherers beginning 
with Kostenki14 share genetic drift with present-day Europeans 


Test 
Ust’-Ishim 
Oasel 
Kostenkil4 
Muierii2 
GoyetQ1 16-1 
Kostenkil2 
Pagliccil33 
Vestonice13 
Vestonice15 
Vestonicel6 
Pavlovl 
Vestonice43 
KremsWA3 
Ostunil 

Maltal 

El Miron 
AfontovaGora2 
AfontovaGora3 
HohleFels49 
Rigney 1 
GoyetQ-2 
Burkhardtshohle 


Villabruna 
Bichon 


Satsurblia 
Rochedane 
Ranchot88 
Kotias 
Falkenstein 
Chaudardes 1 
Loschbour 
LaBranal 
Motalal2 
Hungarian.KO1 
Karelia 
Stuttgart 
BerryAuBac 


SNPs used 


2,050,358 
278,785 


1,676,253 
95,787 
811,756 
59,850 

79 624 
136,598 
30,252 
914,141 
55,835 
160,463 
229,187 
360,347 
1,401,718 
777654 
141,073 
707,617 
62,816 
34,445 
70,210 
37,234 
1,170,777 
2,034,069 
1,419,824 
229,806 
402,274 
2,047,856 
64,043 
90,047 
2,037,082 
1,824,307 
1,816,201 
1,372,801 
1,701,664 
2,023,939 
53,028 


D-value 


0.003 
0.005 


-0.002 
-0.004 
-0.004 
-0.004 
-0.004 
-0.004 
-0.006 
-0.004 
-0.005 
-0.004 
-0.005 
-0.004 
-0.005 
-0.007 
-0.007 
-0.006 
-0.004 
-0.006 
-0.006 
-0.006 


-0.010 
-0.009 


-0.005 
-0.011 
-0.010 
-0.006 
-0.008 
-0.011 
-0.011 
-0.009 
-0.009 
-0.010 
-0.009 
-0.009 
-0.011 


Z score 


6.6 
10.6 


-5.5 
-6.3 
-8.0 
-5.1 
-5.5 
-7.1 
-6.4 
-9.1 
-6.3 
-6.9 
-10.2 
-8.6 
-11.3 
-14.7 
-13.6 
-13.6 
-5.2 
-6.1 
-8.8 
-6.2 


-24.7 
-23.6 


434 
-20.8 
AE: 
458 
116 
-16.0 
-25.4 
-23.0 
338 
-26.5 
-21.9 
-23.9 
-14.0 


The statistic D(Han, Test; French, Mbuti) was computed measuring whether present-day French 
share more alleles with Han or with a Test population (restricting to ancient individuals with at 
east 30,000 SNPs covered at least once). Present-day Europeans share significantly more genetic 


drift with European hunter-gatherers from Kostenki14 onward than they do with Han. Thus, by the 
date of Kostenki14, there was already west Eurasian-specific genetic drift. 
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Midbrain circuits for defenstve behaviour 


Philip Tovote!*, Maria Soledad Esposito'**, Paolo Botta't, Fabrice Chaudun’, Jonathan P. Fadok', Milica Markovic’, 
Steffen B. E. Wolff'+, Charu Ramakrishnan‘, Lief Fenno‘, Karl Deisseroth*, Cyril Herry’, Silvia Arber!? & Andreas Liithi! 


Survival in threatening situations depends on the selection and rapid execution of an appropriate active or passive 
defensive response, yet the underlying brain circuitry is not understood. Here we use circuit-based optogenetic, in vivo 
and in vitro electrophysiological, and neuroanatomical tracing methods to define midbrain periaqueductal grey circuits 
for specific defensive behaviours. We identify an inhibitory pathway from the central nucleus of the amygdala to the 
ventrolateral periaqueductal grey that produces freezing by disinhibition of ventrolateral periaqueductal grey excitatory 
outputs to pre-motor targets in the magnocellular nucleus of the medulla. In addition, we provide evidence for anatomical 
and functional interaction of this freezing pathway with long-range and local circuits mediating flight. Our data define 
the neuronal circuitry underlying the execution of freezing, an evolutionarily conserved defensive behaviour, which is 
expressed by many species including fish, rodents and primates. In humans, dysregulation of this ‘survival circuit’ has 


been implicated in anxiety-related disorders. 


Threatening situations, such as the presence of a predator or exposure to 
stimuli predicting imminent or perceived danger, evoke an evolutionarily 
conserved brain state, fear, which triggers defensive behaviours to avoid 
or reduce potential harm’. A long-standing question in fear and anxi- 
ety research has been how brain circuits generate various forms of defen- 
sive behaviours, which have been used as a read-out for normal fear and 
maladaptive anxiety**°. In rodents, depending on threat imminence’ 
and contextual factors such as the existence of escape routes, defensive 
behaviours range from risk assessment® and freezing”’® to flight and 
defensive attack”!!. These behaviours can be rapidly switched to ade- 
quately adapt to fluctuating threat levels or contextual challenges''!*. On 
the basis of electrical stimulation, lesion and pharmacological studies, 
the midbrain periaqueductal grey region (PAG) has been proposed to 
present an essential part of the circuitry that elicits freezing and flight 
in response to threat!?-??, However, PAG circuit mechanisms underly- 
ing expression of defensive behaviours remain poorly understood. This 
includes a lack of knowledge about the functional roles of different PAG 
neuron types, their connectivity and regulation for expression of defen- 
sive behaviours. The PAG receives inputs from key forebrain regions 
involved in regulation of defensive behaviour, such as the central nucleus 
of the amygdala (CEA)™*4 the hypothalamus”*”’ and medial prefrontal 
cortex”, but little is known about their specific PAG cellular targets. In 
addition, the functional roles of long-range inputs to PAG, neuronal 
subpopulations within PAG subregions and intra-PAG microcircuitry, 
as well as outputs from PAG, in the expression of active and passive 
defensive behaviours is poorly understood. Using optogenetic manipu- 
lations of specific cell types, single-unit recordings and rabies-mediated 
neuroanatomical tracings, we here define a pathway from the CEA to 
the ventrolateral PAG (vIPAG) that mediates freezing by disinhibition of 
vIPAG outputs to pre-motor targets in the magnocellular nucleus (Mc) of 
the medulla. Furthermore, we provide evidence for anatomical and func- 
tional interaction of this ‘freezing pathway’ with circuits mediating flight. 


Freezing is mediated by glutamatergic vIPAG neurons 
To determine cellular diversity in the vIPAG that could be associ- 
ated with a distinct behavioural phenotype, we used an optogenetic 


approach to specifically manipulate the activity of excitatory glutama- 
tergic neurons, one of the main cell classes in the PAG. We targeted 
glutamatergic neurons expressing vesicular glutamate transporter 2 
(vGluT2*) by local injection of adeno-associated viruses (AAV) 
delivering a construct that contained a Cre-dependent channel- 
rhodopsin-2 (ChR2) coupled to an mCherry tag into the vIPAG 
of Vglut2-ires-Cre mice (Fig. la, b and Extended Data Fig. 1a, b). 
Mice injected with AAVs containing a fluorescent tag only served as 
controls. We first optically manipulated cellular activity in naive mice 
under low-fear conditions (that is, at low freezing levels), during 
exposure to a novel context. Strikingly, light-activation of vGluT2* 
neurons of the vIPAG reliably triggered strong freezing behaviour 
during the ‘light on’ period (Fig. 1c and Supplementary Video 1), 
which was reflected by a marked decrease in behavioural activity 
(Fig. 1d). To define the endogenous function of vIPAG glutamatergic 
neurons, we next used viral-vector-mediated, Cre-dependent expres- 
sion of archaerhodopsin (Arch) to optically inhibit these neurons 
(Fig. 1b and Extended Data Fig. 1c). Under low-fear conditions in 
naive mice, we did not observe an effect on freezing or behavioural 
activity (Fig. le, f). To investigate the necessity of vIPAG glutamater- 
gic neurons for conditioned freezing behaviour, we subjected mice to 
auditory fear conditioning, followed by next-day re-exposure to an 
aversively conditioned tone stimulus (CS) or to the context previously 
paired with mild electrical foot-shocks. We found that optical inhibi- 
tion of vIPAG vGluT2™ neurons during the CS blocked tone-induced 
freezing (Fig. 1g and Supplementary Video 2). Similarly, freezing in 
the conditioning context was markedly reduced during optical inhibi- 
tion of vIPAG vGluT2* neurons (Fig. 1h). These results demonstrate 
a role of vIPAG glutamatergic neurons in mediating conditioned 
freezing responses. 

To determine whether these neurons serve a more general role 
in mediating freezing, we next tested the necessity of these neurons 
for producing freezing to an innate threat. Mice exhibit strong fear 
responses when exposed to large moving objects, probably because 
these resemble visual features of a natural predator?*”?. We there- 
fore exposed mice in an open-field arena to a remote controlled toy 


1Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland. @Biozentrum, Department of Cell Biology, University of Basel, 4056 Basel, Switzerland. 
3INSERM, Neurocentre Magendie, U862, 146 Rue Léo-Saignat, Bordeaux 33077, France. “Stanford University, 318 Campus Drive West, Clark Center W080, Stanford, California 94305, USA. 
+Present addresses: Champalimaud Centre for the Unknown, Avenida de Brasilia, 1400-038 Lisbon, Portugal (P.B.); Center for Brain Science, Harvard University, Cambridge, Massachusetts 


02138, USA (S.B.E.W.). 
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Figure 1 | Glutamatergic vIPAG neurons drive defensive responses. 

a, Optogenetics in freely moving mice. b, Expression patterns of ChR2 
(left) and Arch (right) within vGluT2* vIPAG neurons (triangles, fibre 
tracts; scale bars, 200m). ¢, d, Light activation of glutamatergic vIPAG 
neurons triggered freezing behaviour (n = 10 ChR2, n = 12 control, 
two-tailed Wilcoxon signed-rank test). e, f, Inhibition of vIPAG 
glutamatergic neurons had no effect on freezing or behavioural activity in 
naive mice (n=7 Arch, n= 12 control, two-tailed Wilcoxon signed-rank 
test). g-i, Inhibition of vIPAG glutamatergic neurons diminished CS- 
induced freezing (n =6 per group, 1 x 3 analysis of variance (ANOVA), 
F2,10) = 37.72, P< 0.001, Tukey’s post-hoc test), contextual freezing (n = 6 
Arch, n=8 control, paired two-tailed Student's t-test) and innate freezing 
responses (n=6 Arch, n=5 control, paired two-tailed Student's t-test). 

j, Light activation of vIPAG glutamatergic neurons induced analgesia 
(n=6 ChR2, n= 10 control, unpaired two-tailed Student's t-test). 
Box-whisker plots indicate median, interquartile range and 5th-95th 
percentiles of the distribution. Motion plots depict s.e.m. range. *P < 0.05; 
**P < 0.01; ***P < 0.001. 


snake (Extended Data Fig. 1d). We found that while mice exhibited 
strong freezing responses in the presence of the fear stimulus, this 
reaction was dramatically reduced by yellow-light-mediated inhibi- 
tion of vIPAG vGluT2* neurons (Fig. 1i and Extended Data Fig. le). 
The reduction in freezing was attributable to optical inhibition and 
not to changes in threat imminence as measured by spatial distance 
between mouse and snake (Extended Data Fig. 1f, h). Taken together, 
these findings demonstrate that activation of vIPAG glutamatergic 
neurons is necessary for both learned and innate freezing, and that it 
can generate freezing in the absence of threat. 
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Because the PAG is involved in processing ascending and descend- 
ing pain information from the periphery*”, it is conceivable that the 
observed behavioural responses were related to enhanced light-evoked 
pain perception. However, when we tested nociception using a tail 
immersion test, we found that optical activation of vGluT2* vIPAG 
neurons had a marked analgesic effect (Fig. 1). These experiments 
identify a cellular substrate in the vIPAG for analgesia, an important 
part of the defensive response to threat. 


A disinhibitory pathway from CEA to vIPAG 

To address the question of how vIPAG vGluT2* neurons are regu- 
lated, we first aimed to characterize inputs to the vIPAG from the 
CEA, because of their suggested major roles in the expression of 
freezing®'**4, To identify the CEA neurons contacting vIPAG, 
we injected retrogradely transported red fluorescent latex beads 
into the vIPAG of a reporter mouse strain, in which GABAergic 
(\-aminobutyric-acid-releasing) cells express enhanced green fluo- 
rescent protein (Gad1-eGFP,; Fig. 2a). Retrogradely transported beads 
were found throughout the CEA (Fig. 2a). Quantification of overlap 
between beads and GFP* neurons (Fig. 2b) showed that the CEA sends 
a GABAergic projection to vIPAG, consistent with previous reports”. 
Freezing is elicited through enhanced CEA output***, but since these 
neurons are GABAergic, this would result in enhanced inhibition of 
their vIPAG targets. However, our results from optogenetic manipu- 
lation of freezing behaviour suggest that freezing is associated with 
increased activity of vIPAG glutamatergic neurons. Thus, the most 
parsimonious explanation consistent with these observations would 
involve a local vIPAG disinhibitory circuit mechanism that could con- 
vert an inhibitory input from CEA into enhanced output of vIPAG 
glutamatergic neurons. 

To test this hypothesis, we traced monosynaptic connections of CEA 
cells onto either glutamatergic or GABAergic cells in the vIPAG (Fig. 2c) 
using Cre-dependent, cell-type-specific infection with pseudotyped 
EnvA-G-deleted rabies virus (EnvA-AG-rabies)*° in Veglut2-ires-Cre 
or Gad2-ires-Cre mouse lines. These experiments revealed that CEA 
projections preferentially target vIPAG GABAergic cells (Fig. 2d, e). 
Complementing evidence for a disinhibitory CEA-vIPAG circuit was 
provided by whole-cell patch-clamp recordings of vIPAG GABAergic 
cells during optical activation of CEA terminals in acute brainstem 
slices containing PAG (Extended Data Fig. 2a). This experiment 
confirmed the existence of such inhibitory connections to vIPAG 
GABAergic neurons (Extended Data Fig. 2b-f and Fig. 2f). We 
also probed the existence of functional connections between local 
GABAergic and glutamatergic cells in the vIPAG. Glutamatergic cells 
were visualized by viral-vector-mediated, Cre-dependent expression 
of tdTomato in the vIPAG of Vglut2-ires-Cre mice (Fig. 2g). Using 
a double-conditional viral approach*”, ChR2 was introduced into 
vIPAG of non-vGluT2* cells. We recorded from identified vGluT2* 
neurons in acute brain slices during optical activation of local, non- 
vGluT2* neurons (Fig. 2h). In 50% of all recorded vGluT2* cells, we 
observed optically evoked inhibitory postsynaptic potentials (eIPSCs; 
average latency = 5.5 ms) (Fig. 2i) that were completely blocked by 
application of picrotoxin, a GABA, receptor antagonist (Fig. 2i, j). 
Taken together, our data from slice recordings and tracing experi- 
ments suggest that freezing is caused by a disinhibitory process within 
vIPAG that involves CEA-mediated inhibition of local GABAergic 
neurons resulting in enhanced activity of glutamatergic vIPAG neu- 
rons (Fig. 2f). 


Local vIPAG GABAergic neurons control freezing 

To investigate whether in vivo neuronal correlates of freezing in the 
vIPAG would be consistent with a disinhibitory circuit mechanism, 
we next performed single-unit recordings within vIPAG in freely 
moving mice (Fig. 3a). Mice with chronically implanted recording 
electrodes (Fig. 3b and Extended Data Fig. 3a) were fear conditioned 
and freezing responses were evoked during a fear retrieval session 24h 
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Figure 2 | A disinhibitory pathway from CEA to vIPAG. a, Fluorescent 
latex beads in vIPAG (left, scale bar, 200 1m). Beads were found in medial 
(CEm) and lateral (CEI) CEA cells (middle, scale bar, 40 1m; zoom-in 
shown in two right panels, scale bar, 541m). b, Overlap between beads 

and GAD1* or GAD1~ neurons (n =2 mice, two-tailed Mann-Whitney 
test). c, Cell-type-specific monosynaptic rabies tracing strategy. d, Rabies- 
labelled cells within CEA of Vglut2-Cre or GAD2-Cre mice (top panels, 
scale bar, 100 1m). Starter cells coexpressing TVA-GFP, rabiesG-V5 and 
EnvA-AG-mCherry-rabies (black triangles, bottom left panel, scale bar, 
25m). Example of starter cells (white dots, three bottom right panels; 
numbers indicate distance from bregma; scale bar, 200 1m). e, Normalized 
number of rabies cells in CEA (n =3 Vglut2-Cre mice, n=3 GAD2- 

Cre mice, unpaired two-tailed Student’s t-test). f, Schematic model of a 
disinhibitory pathway from CEA to vIPAG. g, h, Expression of ChR2 in 
non-glutamatergic neurons and tdTomato in glutamatergic neurons for 
targeted whole-cell patch-clamp recordings (scale bar, 10 jm). i, j, Light- 
evoked IPSCs (observed in 50% of all glutamatergic neurons tested) were 
blocked by PTX application (n = 12 cells from six slices of four mice, 
two-tailed 1 x 3 ANOVA, Fa,1s) = 18.88, P< 0.0001, Sidak’s post-hoc test). 
Box-whisker plots indicate median, interquartile range and 5th-95th 
percentiles of the distribution; bar plots indicate mean + s.e.m. *P < 0.05, 
REX P< O:001, 
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later*’. Principal component analysis of single-unit activity in relation 
to freezing behaviour revealed that while one population of neurons 
(18%) was activated during freezing bouts, another population (33%) 
was inhibited (Fig. 3c). Consequently, we examined whether glutama- 
tergic and GABAergic neurons could contribute to these differential 
activity patterns. To address this question, we performed recordings 
from optogenetically identified*? GAD2* and vGluT2+ neurons 
within vIPAG (Fig. 3d-f and Extended Data Fig. 3b). In agreement 
with a disinhibitory vIPAG freezing circuit, light-identified GAD2+ 
neurons (n= 4; Fig. 3g) exhibited a relatively high median baseline 
firing rate (8.3 Hz), and all showed lower firing rates during freezing 
compared with non-freezing periods (Fig. 3g, h). Recordings from 
identified vGluT2* units (n =6, median baseline firing rate = 3.0 Hz; 
Extended Data Fig. 3c-e) during freezing periods revealed a more 
heterogeneous picture, suggesting the existence of multiple subpopu- 
lations of glutamatergic neurons in vIPAG. Thus, the in vivo correlates 
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Figure 3 | GABAergic vIPAG neurons control freezing. a, Single-unit 
recordings in the vIPAG of freely moving wild-type mice. b, Example of a 
recording site (triangle; scale bar, 200|1m). c, vIPAG neuronal populations 
showing increased or decreased activity during freezing (n = 8 mice; bin 
size, 10 ms). d, Optical identification of GAD2°* single units in the vIPAG. 
e, Example of ChR2 expression and recording site (triangle; scale bar, 

200 um). f, Identified GAD2* neuron activated by light with short latency 
(5 ms; bin size, 10 ms). Inset: mean spontaneous and light-evoked spike 
waveform. g, h, GAD2* neurons exhibited reduced firing rates during 
freezing (four cells from three mice). i, Optical inhibition of GAD2T 
neurons induced freezing in naive mice (n =6 per group, two-tailed paired 
Student's t-test). j, k, Optical activation of GAD2* neurons impaired 
CS-evoked freezing (n = 12 per group, two-tailed Friedman test, 
P<0.0001, Dunn’s multiple comparison test), and shifted CS-induced fear 
responses towards active behaviour (n = 12 per group, two-tailed paired 
Student’s t-test). Box—whisker plots indicate median, interquartile range, 
and 5th-95th percentiles of the distribution; bars indicate mean + s.e.m. 
Motion plots depict s.e.m. range. *P < 0.05, **P < 0.01, ***P < 0.001. 


of freezing are consistent with a disinhibitory circuit design leading 
to the activation of a subpopulation of glutamatergic vIPAG neurons 
during freezing. 

An important prediction of this model is that manipulating the activ- 
ity of GABAergic vIPAG neurons should affect freezing. In line with 
this interpretation, optogenetic inhibition of vIPAG GAD2* neurons 
resulted in markedly enhanced freezing levels in naive animals (Fig. 3i 
and Extended Data Fig. 3f). Importantly, activation of GAD2* neu- 
rons reduced freezing in response to a conditioned tone (Fig. 3j and 
Extended Data Fig. 3g). Moreover, this manipulation not only reduced 
CS-induced freezing but also resulted in transiently enhanced loco- 
motor activity, resembling flight responses (Fig. 3k). Freezing was also 
reduced by optical activation of GAD2* neurons during re-exposure 
to the conditioning context (Extended Data Fig. 3h), as well as in 
the presence of an unconditioned threatening stimulus (Extended 
Data Fig. 3), while it had no effect on freezing in low-fear conditions 
(a novel context; Extended Data Fig. 3j). Taken together, these data 
are fully compatible with a circuit organization wherein inhibition of 
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local GABAergic vIPAG neurons leads to activation of glutamater- 
gic neurons that is both necessary and sufficient to induce freezing 
behaviour. 


vIPAG output to Mc drives freezing 

We next sought to identify the output pathway mediating the freez- 
ing response. Analysis of axonal projections and synaptic boutons of 
vGluT2* vIPAG neurons (Extended Data Fig. 4a) showed that med- 
ullary regions previously implicated in motor control, such as the 
Mc*®4!, were a major synaptic target. To determine whether vGluT2* 
vIPAG neurons target pre-motor neurons in the Mc directly, we per- 
formed monosynaptic rabies tracing from forelimb motor neurons in 
adult Chat-ires-Cre mice (Fig. 4a). We visualized glutamatergic vIPAG 
terminals by unconditional viral expression of the presynaptic marker 
synaptophysin-GFP (Syn-GFP; Extended Data Fig. 4b) with synaptic 
co-localization of vGluT2 (Fig. 4b). Analysis of presynaptic inputs to 
forelimb pre-motor Mc neurons revealed that they were directly con- 
tacted by glutamatergic vIPAG neurons (Fig. 4c and Extended Data 
Fig. 4c-e). 

We then asked whether vIPAG Mc-projecting glutamatergic neu- 
rons receive local inhibitory input. We first used a monosynaptic 
intersectional rabies tracing approach to specifically label presynaptic 
neurons in the vIPAG projecting onto glutamatergic neurons which 
target Mc (Extended Data Fig. 4f). We found that one-third of the 
local presynaptic neurons were GAD1* (Extended Data Fig. 4g, h), 
and thus putative sources of GABAergic inhibition onto glutama- 
tergic vIPAG output cells. Furthermore, we probed the existence of 
functional connections between local GABAergic and Mc-projecting 
glutamatergic vIPAG cells by combining the approach described in 
Fig. 2g with an injection of retrogradely transported latex beads into 
the Mc (Fig. 4d). All Mc-projecting vGluT2* neurons recorded in 
whole-cell patch-clamp showed eIPSCs induced by light activation of 
vIPAG non-vGluT2* cells, which were blocked by picrotoxin applica- 
tion (Fig. 4e). These findings provide evidence that excitatory vIPAG 
output to the Mc is under local GABAergic control and suggest that 
this pathway could be part of the disinhibitory circuit underlying 
freezing. 

To establish functional relevance of this projection for the freez- 
ing response, we next used an intersectional optogenetic approach 
to specifically manipulate activity of the glutamatergic vIPAG- 
to-Mc projection. We injected into the Mc of Vglut2-ires-Cre mice 
retrogradely trafficked herpes simplex virus (HSV), which Cre- 
dependently expresses flipase (Flp)”. This allowed us to selectively 
introduce ChR2 into vIPAG glutamatergic neurons projecting to 
Mc (vIPAG-to-Mc) on the basis of their co-expression of both Cre 
and Flp using double conditional AAVs (Fig. 4f and Extended Data 
Fig. 4i-k). Optical activation of glutamatergic vIPAG-to-Mc neurons 
resulted in instantaneous and strong freezing behaviour as reflected 
by decreased behavioural activity (Fig. 4g, h, Extended Data Fig. 4] 
and Supplementary Video 3). Interestingly, and in contrast to the 
anti-nociceptive effect elicited upon projection-unspecific activa- 
tion of vGluT2* neurons in the vIPAG (Fig. 1j), no analgesia was 
observed after stimulation of the glutamatergic vIPAG-to-Mc neurons 
(Fig. 4i). Together, these findings demonstrate that vIPAG-to-Mc 
glutamatergic projection neurons are specifically involved in the 
expression of freezing behaviour and suggest that analgesia and freezing 
could be mediated by distinct subpopulations of vIPAG glutamatergic 
neurons. 


Interactions between freezing and flight pathways 

Our data show that activation of vIPAG vGluT2™ neurons induces 
freezing (Fig. 1). However, in mice with viral expression extending to 
dl/IPAG, we observed radically different, active defensive behavioural 
responses through optical activation of vGluT2* neurons (Fig. 5a 
and Extended Data Fig. 5a). These consisted of strong light-induced 
locomotor activity (Fig. 5b), amounting to marked flight responses 
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Figure 4 | Glutamatergic output to the Mc drives freezing. a, Strategy to 
assess vIPAG input onto Mc pre-motor neurons. b, vIPAG glutamatergic 
synapses contacting Mc pre-motor neurons were identified by co-staining 
of Syn-GFP and vGlut2 (triangles; scale bar, 2 1m). c, Three-dimensional 
reconstruction of a Mc pre-motor neuron contacted by vIPAG vGluT2* 
synapses (yellow circles; scale bar, 101m). d, Intersectional approach to 
investigate connectivity of vIPAG GABAergic cells with Mc-projecting 
glutamatergic neurons. e, Light activation of non-glutamatergic fibres 
evoked IPSCs in all bead* Mc-projecting glutamatergic neurons 

tested (nine out of nine cells, one example cell shown), and six out of 
nine glutamatergic, bead~ neurons. All elPSCs were blocked by PTX 
application (four slices of four mice; bead*: two-tailed Wilcoxon signed- 
rank test; bead~: 1 x 3 ANOVA F(,10) = 10.28, P< 0.01, Sidak’s post-hoc 
test). f, Strategy to express ChR2 selectively in glutamatergic vIPAG 
neurons projecting to Mc. g, h, Light activation of glutamatergic vIPAG 
neurons projecting to Mc in naive mice resulted in strong freezing (n =7 
ChR2, n= 10 control, two-tailed Wilcoxon signed-rank test). i, Light 
activation of the glutamatergic vIPAG-to-Mc projection had no effect on 
nociception (n=5 ChR2, n= 11 control, unpaired two-tailed Student's 
t-test). Box-whisker plots indicate median, interquartile range and 
5th-95th percentiles of the distribution; bar plots indicate mean + s.e.m. 
Motion plots depict s.e.m. range. *P < 0.05; **P< 0.01. 


in some cases. Nonetheless, bouts of forward locomotor activity 
were often interrupted by short periods of freezing, which resulted, 
on average, in intermediate levels of freezing (Fig. 5c and Extended 
Data Fig. 5b). Broad activation of vGluT2* neurons within PAG also 
resulted in strong analgesia (Fig. 5d). Given the observation of alter- 
nating active and passive defensive responses, this strongly suggests 
intricate interactions between flight and freezing circuits and raises 
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Figure 5 | Interactions between PAG freezing and flight pathways. 

a, Optogenetic manipulation of vGluT2* neurons in dl/IPAG and vIPAG 
(triangles indicate fibre tracts; scale bar, 200 1m). b, c, Light activation 

of PAG glutamatergic neurons in naive mice resulted in flight responses 
and intermediate freezing levels (n = 12 ChR2, n= 12 control, two-tailed 
Wilcoxon signed-rank test). d, Light activation of PAG glutamatergic 
neurons resulted in strong analgesia (n = 9 ChR2, n = 10 control, unpaired 
two-tailed Student's t-test). e, Hypothalamic presynaptic inputs onto 
vGluT2* (n=3 mice for each vIPAG and dl/IPAG) and GAD2* (n=4 
mice) neurons within vIPAG or dl/IPAG (1 x 3 ANOVA, Tukey’s post-hoc 
test for each analysed region). Quantification reveals differential input 

of hypothalamic subregions to vGluT2* (1 =3 mice for each vIPAG and 
dl/IPAG) and GAD2* (n=4 mice) neurons within vIPAG or dl/IPAG. 
Ventromedial hypothalamic nucleus (VMH) (1 x 3 ANOVA, F(2,7) = 40.22, 
P<0.0001, Tukey’s post-hoc test), posterior hypothalamic nucleus 

(PH) (1 x 3 ANOVA, F,7) = 21.01, P< 0.05, Tukey’s post-hoc test) and 


the question of how such interactions are implemented within PAG 
circuitry. 

We therefore examined whether distinct presynaptic inputs 
differentially connect onto specific PAG neuronal subpopula- 
tions (Extended Data Fig. 5c). Monosynaptic, cell-specific rabies 
tracing revealed high selectivity in the hypothalamic projections 
to defined PAG neuronal subpopulations (Fig. 5e and Extended 
Data Fig. 5d-f). It is conceivable that hypothalamic input to both 
dl/IPAG and vIPAG glutamatergic neurons promotes a range 
of defensive behaviours?®”’. In turn, activity of the disinhibi- 
tory pathway originating in the CEA might bias the behavioural 
response towards freezing instead of flight. We thus hypothesized 
that, because of their role in controlling freezing, GABAergic neu- 
rons of the vIPAG are poised to present a neuronal substrate for 
the interaction of freezing and flight pathways. Consequently, we 
asked whether glutamatergic, flight-promoting neurons of the 
dl/IPAG could negatively regulate vIPAG excitatory output via acti- 
vation of vIPAG GABAergic neurons to inhibit the freezing response. 
In offspring of Vglut2-ires-Cre crossed with Gad1-eGFP mice, we 
performed minimal injections of diluted virus to Cre-dependently 
express Syn-Myc in vGluT2* dl/IPAG neurons only (Fig. 5f). In line 
with our hypothesis, we found that dl/IPAG glutamatergic neurons 
form synaptic contacts with GABAergic vIPAG cells (Fig. 5g, h), 
whose optical activation can lead to flight responses (Fig. 3k). These 
results support the notion that vIPAG GABAergic neurons integrate 
multiple inhibitory and excitatory inputs from distinct upstream 
brain areas to regulate the selection of appropriate active or passive 
defensive behaviours. 


Discussion 

Our study defines an amygdala-midbrain-medullary circuit through 
which freezing behaviour, an evolutionarily conserved response to 
threat, is generated. Central to this process is a circuit mechanism 
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premammillary nucleus (PMD) (1 x 3 ANOVA, F(2,7) = 287, P< 0.0001, 
Tukey’s post-hoc test) preferentially target dl/IPAG vGluT2* neurons, 
whereas dorsomedial hypothalamic nucleus (DM) (1 x 3 ANOVA, 

F,7) = 4.082, P > 0.05), anterior hypothalamus, medial part (AHM) (1 x 3 
ANOVA, Fo2,7) = 3.294, P > 0.05), peduncular part of lateral hypothalamus 
(PLH) (1 x 3 ANOVA, Fo2,7) = 0.153, P > 0.05) and lateral hypothalamic 
area (LH) (1 x 3 ANOVA, F(2,7) =0.948, P > 0.05) showed no preference. 

f, Expression of Syn-myc in glutamatergic terminals of dl/IPAG projections 
to GABAergic neurons within vIPAG (left scale bar, 200 jim; right scale 
bar, 201m). g, High-resolution image of a GAD1* vIPAG neuron with 
dl/IPAG input (scale bar, 5|1m). h, Quantification of dl/IPAG vGluT2* 
synapses onto vIPAG GAD1* (n= 21 ipsilateral cells from four mice, 

n= 13 contralateral cells from two mice, unpaired two-tailed Student’s 
t-test). Box-whisker plots indicate median, interquartile range and 
5th-95th percentiles of the distribution; bar plots indicate mean + s.e.m.; 
motion plots depict s.e.m. *P < 0.05; ***P < 0.001. 


involving the disinhibition of vIPAG glutamatergic neurons projecting 
to pre-motor cells located in the Mc. Disinhibition of this vIPAG—Mc 
pathway is generated by a disynaptic GABAergic local micro-circuit 
receiving inhibitory input from CEA (Extended Data Fig. 6). It is 
important to note that the excitatory vIPAG to Mc pathway did not 
mediate concomitant analgesia, a hallmark of the general defensive 
response to threat. Consistent with this result, our single-unit data 
show that some glutamatergic neurons of the vIPAG are positively 
and others are negatively correlated with freezing behaviour. These 
findings suggest that different aspects of the defensive response, such 
as freezing, flight and analgesia, could be mediated by distinct gluta- 
matergic output pathways from the PAG. 

Our data suggest the existence of a dedicated vIPAG output medi- 
ating freezing. However, to ensure a rapid switch between passive 
and active coping with fluctuating threat levels, interactions between 
freezing and flight circuits are required. Active defensive behaviour 
including flight could be driven by different hypothalamic”®”’ or 
prefrontal* inputs, directly or via disinhibition onto dl/IPAG glu- 
tamatergic neurons, which concomitantly might block freezing 
behaviour by activation of GABAergic neurons controlling excitatory 
vIPAG output to the medulla. This notion is supported by our finding 
of glutamatergic inputs from dl/IPAG onto GABAergic neurons of 
the vIPAG. While earlier models of PAG function have emphasized 
its columnar organization!?!71934444, or the existence of parallel 
input-output pathways mediating active or passive defensive behav- 
iours*'!?°, the model emerging from our study supports a key role 
for local PAG circuitry in the integration of extrinsic inputs to ensure 
rapid behavioural, autonomic and endocrine adaptations in the face 
of threat. 

A growing body of evidence suggests that the interactions of 
distinct types of neuron within highly organized neuronal circuits 
are critical for any higher brain function’, and that circuit dys- 
regulation contributes to psychiatric conditions, among which fear 
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and anxiety-related disorders are the most prevalent*>°". Our study 
shows that similar organizational principles and functional motifs exist 
even within evolutionarily old, mammalian ‘survival circuits? dedi- 
cated to expression of defensive behaviours. A mechanistic functional 
understanding of these circuits will provide new insights into possible 
mechanisms underlying human psychiatric conditions associated with 
maladaptive coping behaviours under stressful conditions. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Animals. Experimental subjects were adult (2- to 5-month-old) male, wild-type 
(Charles River Laboratories) or mutant mice of the C57BL/6J strain. Slc17a6"(7/Lom! 
(Vglut2-ires-Cre) and Chat!"2(r)Lowl (Chat-ires-Cre) mice were obtained from 
Jackson Laboratories. Founders for a Gad2!"?(")2i" (Gad2-ires-Cre) and Gad1- 
eGFP mice colony were initially provided by Z. J. Huang. Tau-lox-stop-lox- 
SynGFP-IRES-nlsLacZpA mice came from an in-house colony”’. All mice were 
individually housed in a 12h light/dark cycle and all experiments were performed 
during the light cycle. Food and water was available ad libitum. Sample sizes were 
estimated based on previous studies using similar experimental designs**?**54"”, 
All animal procedures were performed in accordance with institutional guidelines 
and were approved by the Veterinary Department of the Canton of Basel-Stadt. 
Viral injections and optogenetics. Isoflurane (Attane, induction 3%, mainte- 
nance 1.5%; Provet) in oxygen-enriched air (Oxymat 3; Weinmann) was used to 
anaesthetize mice fixed in a stereotactic frame (Kopf Instruments 1900 series). 
Before opening of the scalp, local injections of ropivacain (Naropin; AstraZeneca) 
provided analgesia during surgery. After completion of surgery, intraperitoneal 
injections of meloxicam were administered to alleviate pain (6011 of 0.5mg ml}, 
Metacam; Boehringer). A feedback-controlled heating pad (FHC) ensured main- 
tenance of core body temperature at 36 °C. A volume of 50-200 nl virus solution 
(depending on respective viral titre and observed expression strength) was pres- 
sure-injected intracranially using calibrated glass pipets (511 microcapillary tube; 
Sigma-Aldrich) connected to a picospritzer III (Parker). To avoid the subcranial 
midline blood sinus targeting the vIPAG, holes with a diameter of 0.3 mm were 
drilled bilaterally into the skull at +1.7 mm (dl/IPAG: +1.4mm) from midline 
suture, and at the level of the lambda suture. The injection capillary was then 
slowly lowered using a hydraulic micropositioner (Kopf Instruments model 2650) 
at a zenith angle of 26° to the target depth of 3 mm (dl/IPAG: +2.6 mm) below 
brain surface. Coordinates for CEA injections were —1.1 mm caudal and +2.7 mm 
mediolateral to bregma, at —4.2 mm perpendicularly below brain surface. The 
Mc of the medulla was targeted with bilateral perpendicular injections —6.4mm 
caudal and +0.95 mm mediolateral to bregma, with an injection depth of 5.6mm. 
Cell-type-specific expression of optical actuators was achieved using the follow- 
ing, Cre-dependent AAV: rAAV(2/5)/EF la-flex-hChR2(T159C)-mCherry (UNC 
Vector Core), rAAV(2/9)/CAG-flex-ReaChR-Citrine- YFP-WPRE (custom design, 
Vector Biolabs), rAAV (2/7) EF la-flex-ChR2(H134R)-2A-NpHR-2A- Venus” and 
rAAV(2/5)CBA-flex-ARCH-GFP“. For characterization of the functional connec- 
tion between GABAergic and glutamatergic neurons in the vIPAG, we introduced 
ChR2 into vGluT2°, putative GABAergic neurons in the vIPAG using a dou- 
ble-conditional approach in Vglut2-ires-Cre mice. We co-injected two AAVs: one 
that delivered Flp recombinase in an unconditional manner and one that mediated 
expression of ChR2 only in the presence of Flp and in the absence of Cre (AAVdj/ 
hSyn- Cre0F-FlpOS-hChR2(H134R)-eYFP)*” Visual targeting of vGluT2* neu- 
rons was achieved by Cre-dependent expression of tdTomato (rA AV (2/9)-flex-td- 
Tomato). For Cre- and Flp- dependent expression of ChR2 in the glutamatergic 
vIPAG-to-Mc projection, we combined an injection of a retrogradely trafficked 
HSV (HSV/hEFla-LS1L-mCherry-IRES-flpo; R. Neve) into the Mc with another 
injection of AAV(dj)/hSyn-CreON/Flp°N-hChR2(H134R)-mCherry” into the 
vIPAG of Vglut2-ires-Cre mice. Control mice were injected with the following 
AAVs: AAV (2/5)EF la-flex-tdTomato (provided by G. Keller), AAV(2/9)CAG- 
flex-eGFP-WPRE-bGH, AAV(2/9)CAG-flex-tdTomato-WPRE-bGH (both Penn 
Vector) and AAV(dj)/hSyn-Cre’/FlpO’—mCherry””. 

For optical manipulation or electrophysiological recordings, mice were 
implanted with custom-built fibre connectors (fibre: 0.48 numerical aperture, 
200\1m diameter; Thorlabs) 3-4 weeks after virus injections. The tip of the fibre 
was lowered at an angle of 26° to 25041m above the injection site in the PAG. 
Implants were fixed to the skull with skull screws (P.A. Precision Components), 
cyanoacrylate glue (Ultra Gel; Henkel) and dental cement (Paladur; Heraeus). 
All fibre connectors were tested for effective light transduction before 
implantation. For optical stimulation of ChR2, laser light of 473 nm (CNI Laser) 
was applied, whereas laser light of 594nm was used for optical stimulation of NPHR 
or Arch. Light intensity was adjusted with an optical power meter (Thorlabs) to 
reach 10-15mW at the end of the implanted fibre stub. 

Histology. After completion of experiments, mice were transcardially perfused 
with 4% paraformaldehyde in phosphate-buffered saline (PBS). Fixed brains were 
cryoprotected in 30% sucrose/PBS and cut on a cryostat in 80 {1m coronal slices. 
Antibody stainings were performed on single-well floating tissue sections. Sections 
were incubated for 48h in primary antibodies at 4 °C followed by one overnight 
incubation with secondary antibodies at 4 °C. Primary antibodies used in this study 
were as follows: chicken anti-GFP 1:1,000 (A10262, Molecular Probes), rabbit anti- 
REP 1:5,000 (600-401-379, Rockland Immunochemicals), guinea pig anti-vGlut2 
1:5,000 (AB5907, Chemicon), mouse anti-V5 1:1,000 (R960CUS, Invitrogen), 


mouse anti-NeuN 1:1,000 (MAB377, Chemicon), mouse anti-Myc 1:100 
(CRL-1729, ATCC), goat anti-Bgal 1:4,000 (AR2282, Biogenesis), mouse anti- 
channelrhodopsin-2 1:2 (clone 15E2, 651180, PROGEN Biotechnik), guinea pig 
anti-rabiesG 1:500 (provided by P. Scheiffele). Fluorophore-tagged secondary anti- 
bodies used were Alexa Fluor 488 donkey anti-chicken IgY 1:1,000 (703-545-155, 
Jackson), Cy3 donkey anti-rabbit IgG 1:1,000 (711-165-152, Jackson), Alexa Fluor 
657 donkey anti-guinea pig IgG 1:1,000 (706-605-148, Jackson), Alexa Fluor 657 
donkey anti-mouse IgG 1:1,000 (715-605-150, Jackson), Alexa Fluor 657 donkey 
anti-goat IgG 1:1,000 (705-605-147, Jackson) or Alexa Fluor 488 donkey 
anti-mouse IgG 1:1,000 (A21202, Molecular Probes). For counterstaining, 
sections were incubated for 10 min with 4’,6-diamidin-2-phenylindol (DAPI, 
1:10,000, Sigma). Stained brain sections were mounted on gelatin-coated slides 
and coverslipped with custom-made glycerol-based medium (Fluorostab). Slides 
were imaged using an automated slide scanner microscope (Zeiss Axioscan). 

Placement of the optical fibres was assessed on the basis of the lesion the fibre tip 
inside the brain tissue. Mice with no or unilateral expression of the virus, or fibre 
tip placement outside of the PAG, were excluded from the analysis. To analyse virus 
expression in Vglut2-ires-Cre mice expressing ChR2(T159C), we outlined the area 
of somatic viral expression on the respective sections in a mouse brain atlas*! for 
each animal, and then overlaid the areas at 30% transparency (Adobe Illustrator) 
to visualize the average centre of expression in each behavioural group. 
Behaviour. Mice of different litters but the same genotype were used in individ- 
ual experiments. No criteria were used to allocate mice to experimental groups, 
and, for blinding, experimental subjects had unique letter/number identifiers that 
indicated genotype but no group assignment. All basic testing of light-induced 
behavioural effects was performed under low-fear conditions in a novel, circu- 
lar Plexiglas cylinder with smooth white floor (diameter 27 cm) under dim-light 
conditions in a dark-walled sound-attenuated chamber. Acetic acid (1%) was used 
to clean the context and to provide a distinct olfactory stimulus. After 1 min of 
habituation, light was applied four times for 5s, with an inter-stimulus interval 
of 55s. Total duration of freezing during the four ‘light on’ periods was compared 
with time spent freezing in the period of equivalent length immediately before 
onset of the first light stimulus. 

To investigate light-mediated effects on conditioned freezing, mice were 
subjected to auditory fear conditioning in a brightly illuminated square context 
(27cm x 27cm) with a metal grid floor. A train of 20 tone beeps (7.5 kHz, 75 dB 
sound pressure level, 500 ms duration, 500 ms inter-beep-interval) was used as the 
conditioned stimulus (CS) and an electrical foot-shock (0.6 mA dc, 1s duration) 
was used as the unconditioned stimulus (US). The conditioning session lasted 
440s during which the mice were exposed to three back-to-back CS-US pair- 
ings in a pseudo-random fashion, with a baseline period of 180s and a minimal 
inter-stimulus interval of 80s. On the day after conditioning, mice were exposed 
to four CS-only presentations in a dimly illuminated context different from the 
conditioning context. While the conditioning context was cleaned with 70% etha- 
nol, the retrieval context was wiped down with 1% acetic acid. The duration of the 
retrieval session was 540s, with a baseline period of 180s and a pseudo-random 
presentation of the CS with a minimal inter-stimulus interval of 120s. To test for 
light-activation effects on CS-induced defensive behaviour, the second and fourth 
presentation of the CS was paired with 20s of continuous laser light. Total duration 
of freezing during the two CS-alone periods (first and third) was compared with 
time spent freezing during the two CS periods (second and fourth) paired with 
‘light on; and with a baseline period of equal length (40s) directly before onset of 
the first CS. Contextual fear was tested a day later by re-introducing the experi- 
mental subject into the original conditioning context for 5 min. To test for effects 
of light activation on contextual freezing laser illumination was turned on twice for 
1 min, with a 1 min pre-baseline and a 1 min inter-stimulus interval. Total duration 
of freezing during the two ‘light om’ periods was compared with time spent freezing 
in the time period of equivalent length immediately before light onset. 

For in vivo recordings of unidentified PAG single units, auditory fear condition- 
ing and testing took place in two different contexts (contexts A and B). To measure 
movement, an automated infrared beam detection system located on the bottom 
of the experimental chambers was used (Coulbourn Instruments). The animals 
were considered to be freezing if no movement was detected for 2s (ref. 38). On 
day 1, C57BL6/J mice were submitted to a habituation session in context A, in 
which they received four presentations of the CS* and the CS~ (total CS duration, 
30s; consisting of 50-ms pips at 0.9 Hz repeated 27 times, 2 ms rise and fall, pip 
frequency, 7.5 kHz or white-noise, 80 dB sound pressure level). Discriminative fear 
conditioning was performed on the same day by pairing the CS* with a US (1s 
foot-shock, 0.6mA, 5 CS*-US pairings, inter-trial intervals, 20-180 s). The onset 
of the US coincided with the offset of the CSt. The CS~ was presented after each 
CSt-US association but was never reinforced (five CS~ presentations; inter-trial 
intervals, 20-180s). The frequencies used for CS* and CS~ were counterbalanced 
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across animals. On day 2, conditioned mice were submitted to a testing session 
(retrieval session) in context B during which they received four presentations of 
the CS” and CS". 

To test for effects on unconditioned freezing, mice were exposed to a 
remote-controlled toy snake in an open-field arena (50cm x 50cm). To control 
for baseline activity and laser effects, mice were pre-exposed to the open field alone 
for 10 min, with four laser illumination periods of 20s (inter-stimulus interval 40 s) 
in the second half of this phase. After introduction of the snake, mice remained in 
the open field for another 5 min, during which the snake’s movement was remotely 
controlled by the experimenter outside the chamber. To maintain high freezing 
levels, the snake was moved around the area, thereby covering varying distances 
to the mouse without establishing direct contact. 

Locomotor activity was recorded by an infrared beam system (Coulbourn 
Instruments) or an overhead video tracking system (CinePlex Studio). Freezing 
was defined as immobility detected by lack of beam breaks for 2s, as described 
before**. Using the video tracking system, freezing was extracted with the freezing 
detector plug-in (CinePlex Editor). A 2s criterion of the thresholded motion meas- 
ure, based on a contour-tracking algorithm, was used to define freezing. Motion 
measure was automatically computed as the normalized frame-by-frame difference 
of the animal's body contour in pixels. Automatically detected freezing behaviour 
was cross-checked on the video recording to exclude false-positive freezing bouts, 
for example during grooming episodes, or include false negative freezing intervals, 
for example owing to motion artefacts caused by cable movement in front of the 
camera. Timestamps for freezing episodes and stimulation events (CS, US, laser) 
were imported into data analysis software (Neuroexplorer 5, Nex Technologies) 
and averaged over the respective time interval. Behavioural activity of the animal 
was assessed using the motion measure of the video tracking system. Motion was 
averaged over 1-s bins and normalized as the percentage change in relation to a 
baseline period. Baseline was the time directly before stimulus onset: that is, 16s 
before ‘light on’ in experiments with naive mice, and 10s before CS/CS* ‘light or 
in cued fear-conditioning experiments. Combined contour tracking of the mouse 
and colour-tracking of the toy snake were used to extract x-y coordinates of the 
two subjects and calculate their distance. 

Tail immersion test. To test for analgesic effects induced by optical stimulation 
of glutamatergic cells of the PAG, mice tail tips were immersed in hot water with 
a temperature of 50 °C. This was done eight times with an inter-trial interval of 
40s, and tail withdrawal latency was scored frame-by-frame (Windows Live Movie 
Maker) from the video recorded during the test session (Plexon Cineplex). On the 
last four trials, laser light was turned on for 5s directly before the tail immersion. 
If the mouse did not withdraw its tail within 10s of immersion, the trial was 
terminated. Tail withdrawal latencies of the first four non-manipulated trials were 
averaged as baseline and compared with average withdrawal latency during the last 
four light-stimulated trials. 

Slice electrophysiology. Standard procedures were used to prepare 300 1m 
thick coronal slices from 12- to 14-week-old male Vglut2-ires-Cre or Vglut2- 
ires-Cre::Gad1-eGFP mice, which received intracranial virus injections 4 weeks 
before. The brain was dissected in ice-cold artificial cerebrospinal fluid, mounted 
on an agar block and sliced with a vibrating-blade microtome (HM 650 V, Carl 
Zeiss) at 4 °C. Slices were maintained for 45 min at 37 °C in an interface chamber 
containing artificial cerebrospinal fluid equilibrated with 95% O2/5% CO, and 
containing the following (in mM): 124 NaCl, 2.7 KCl, 2 CaCh, 1.3 MgCh, 
26 NaHCOs, 0.4 NaH>POy,, 18 glucose, 4 ascorbate. Recordings were performed 
with artificial cerebrospinal fluid in a recording chamber at a temperature of 35 °C 
at a perfusion rate of 1-2 ml min~!. PAG neurons were visually identified with 
infrared video microscopy using an upright microscope equipped with a x 40 
objective (Olympus). Patch electrodes (3-5 MQ) were pulled from borosilicate glass 
tubing. For voltage clamp experiments to record eIPSCs, patch electrodes were 
filled with a solution containing the following (in mM): 110 CsCl, 30 K-gluconate, 
1.1 EGTA, 10 HEPES, 0.1 CaCl, 4 Mg-ATP, 0.3 Na-GTP (pH adjusted to 7.3 
with CsOH, 280 mOsm) and 4 N-(2,6-dimethylphenylcarbamoylmethy]) triethyl- 
ammonium bromide (QX-314; Tocris-Cookson). 

Evoked IPSCs were elicited by 10 ms blue-light stimulation of either local vIPAG 
axon terminals of non-glutamatergic neurons expressing ChR2 or ChR2* CEA 
axons projecting to PAG. To exclude glutamatergic inputs, CNQX (6-cyano-7- 
nitroquinoxaline-2,3-dione, 101M: AMPA receptor antagonist) and (R)-CPP 
((RS)-3-(2-carboxypiperazin-4-yl)-propyl-1-phosphonic acid, 10 1M: NMDA 
receptor antagonist) were added to the artificial cerebrospinal fluid. To confirm 
the eIPSCs GABAergic nature, picrotoxin (100 j1M) was added at the end of the 
recordings. Successful connections were scored if the amplitude of eIPSCs was 
higher than 10 pA, with the latency within 10 ms for at least 60% of the trials (six 
out of ten trials). Whole-cell patch-clamp recordings were excluded if the access 
resistance exceeded 13 MQ. and changed more than 20% during the recordings. 
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Data were recorded with a MultiClamp 700B (Molecular Devices) amplifier, 
filtered at 0.2 kHz and digitized at 10 kHz. Data were acquired and analysed with 
Clampex 10.0, Clampfit 10.0 (Molecular Devices). All chemicals for the internal 
and external solutions were purchased from Fluka/Sigma. Glutamatergic blockers 
were purchased from Tocris Bioscience. 

Anatomical tracing. To characterize CEA inputs to vIPAG, we injected retro- 
gradely transported fluorescent latex beads (Lumafluor) into the vIPAG of two 
Gad1-eGFP mice. Four days after injection, mice were killed, transcardially per- 
fused with 4% paraformaldehyde in PBS, and brains were extracted and processed 
for histology as described above. On four coronal sections from each mouse, bead* 
cells in the CE] and CEm were counted and normalized against total cells stained 
by NeuN. 

To identify the presynaptic partners of specific neuronal subpopulations 
in the PAG, we used a monosynaptically restricted pseudotyped rabies virus**. 
We performed a local injection in the vIPAG or IPAG of AAVs conditionally deliv- 
ering rabies glycoprotein (AAV(2/9)CAG-flex-rabiesG-2A-H2B-10xV5-tag, short: 
AAV-flex-rabiesG)° and TVA (AAV(2/9)CAG-flex-TVA-2A-H2B-eGFP, short: 
AAV-flex-TVA)’ into either Vglut2-ires-Cre or Gad2-ires-Cre mice. Two weeks 
later, we injected EnvA-AG-mCherry-rabies* into the same location. Mice were 
killed 7 days thereafter, transcardially perfused with 4% paraformaldehyde in PBS, 
and brains were extracted and processed for histology as described above. Brain 
sections corresponding to the injection site were stained for GFP, red fluorescent 
protein (RFP) and V5, and counterstained with DAPI. Brain sections outside the 
injection site were stained for RFP and NeuN. To compare the relative input to the 
different PAG subpopulations, we quantified the number of starter cells within 
the PAG (mCherry, GFP and V5 triple-positive cells) in four sections around the 
injection site and we counted the number of mCherry” cells within the entire CEA. 
Images were acquired with an Olympus confocal microscope (FV 1000) with a 
motorized stage, using a x20 objective, a 3 x 3 tiled scan with 15% overlap and a 
step size of 1.5,1m along the depth of the slice. Triple-positive cells in the PAG and 
mCherry* cells in CEA were counted manually along every confocal plane using 
Imaris software (Bitplane). To quantify the hypothalamic input to PAG subpopula- 
tions, we analysed approximately half the volume of the hypothalamus by imaging 
every other section along the rostro-caudal axis with an automated slide scanning 
microscope (Zeiss Axioscan Z1) using a x 10 objective. mCherry* neurons were 
counted manually on the image projection. We use the following nomenclature 
for the hypothalamic nuclei: dorsomedial hypothalamic nucleus (DM), anterior 
hypothalamus, medial part (AHM, including the entire anterior hypothalamic area, 
the latero-anterior hypothalamic nucleus and the paraventricular hypothalamic 
nucleus), ventromedial hypothalamic nucleus (VMH), posterior hypothalamic 
nucleus (PH), peduncular part of lateral hypothalamus (PLH, including the 
Mc of the lateral hypothalamus), lateral hypothalamic area (LH, including the 
parasubthalamic nucleus), premammillary nucleus (PMD). All quantifications 
were performed by an experimenter blind to the subject’s genotype. 

To study PAG projections to brainstem neurons directly connected to spinal 
motor neurons (pre-motor neurons), we combined an anterograde vIPAG injection 
of an AAV expressing presynaptic fluorescent markers and monosynaptic rabies 
spreading from spinal motor neurons. To label pre-motor neurons, we injected a 
monosynaptic rabies virus into forelimb muscles that retrogradely infected the 
corresponding spinal motor neurons. To allow for spreading of the rabies virus in 
adult mice, we complemented cervical motor neurons with an AAV conditionally 
expressing rabies glycoprotein through an intra-spinal injection in ChAT-Cre mice. 
First, we injected into the vIPAG of ChAT-Cre adult mice a custom-made AAV that 
unconditionally expressed GFP-tagged synaptophysin (Syn) in presynaptic termi- 
nals (AAV(2/9)/CAG-flex-SynGFP + AAV(2/9)/CMV-Cre)*”. In the same surgery 
session, we injected AAV (2/ 9)CAG-flex-rabiesG-2A-H2B-10xV5-tag”® into the 
cervical part of the spinal cord. Two weeks thereafter, G-deleted rabies virus coated 
with the CVS-glycoprotein was injected in triceps and biceps muscles. Eight days 
after rabies injection, mice were killed and brains were immunostained against 
GFP, RFP and vGluT2 and counterstained with DAPI. High-resolution three- 
dimensional images of eight complete pre-motor cells in the Mc were acquired on 
a custom-made dual motorized spinning-disk microscope (Life Imaging Services) 
using a x63 objective, 8 x 8 tile scan and 0.2|1m step size. We quantified the num- 
ber of GFP* (vGluT2*) vIPAG inputs to soma and dendritic tree of these cells 
manually in eight neurons from three mice. The Mc was defined as described 
before” and included lateral paragigantocellular, as well as ventral and alpha parts 
of the gigantocellular, reticular nucleus. 

To assess local vIPAG inputs specifically onto Mc-projecting glutamatergic 
neurons, we used an intersectional viral approach (Extended Data Fig. 4f-h). We 
injected a retrogradely transported HSV that Cre-dependently delivered rabies 
G-protein (HSV/hEF1a-LS1L-rabiesG; R. Neve) into the Mc of offspring from 
Vglut2-ires-Cre crossed with Gad1-eGFP mice. In the same surgery session, we 
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injected conditional AAV into the vIPAG to virally express the TVA receptor in 
glutamatergic cells (rAAV(2/9)CAG-flex-T VA-2A-H2B-eGFP, custom designed, 
Vector Biolabs). Two weeks later, we injected EnvA-AG-mCherry-rabies into the 
Mc. Mice were killed 7 days later and brains processed as described above. We then 
used standard immunohistochemistry to stain for GFP, RFP and rabies G-protein. 
High-resolution confocal images (x40) of vIPAG were taken from three coronal 
sections of the PAG (rostral, medial, caudal) and individual cells were manually 
counted using Imaris software (Bitplane). Triple-positive cells were identified as 
starter cells, whereas GFP and RFP double-positive cells represented local GAD1* 
cells connected to glutamatergic Mc-projecting neurons. 

We performed several controls to demonstrate the specificity of the mono- 
synaptic rabies tracing technology. To check for the specificity of the AAV 
viruses delivering the TVA receptor (AAV-flex-T VA) and rabiesG protein (AAV- 
flex-rabiesG), we injected those viruses in the vIPAG of offspring from Vglut2- 
Cre and Tau-lox-stop-lox-SynGFP-IRES-nlsLacZpA reporter mouse lines. After 
2 weeks, animals were killed and the injection site was cut, immunostained and 
co-localization of GFP or V5 with b-Gal was analysed. In an additional exper- 
iment, Vglut2-ires-Cre animals were injected with AAV-flex-TVA followed by 
EnvA-AG-mCherry-rabies 2 weeks later. Furthermore, to test for the leakiness 
of the EnvA-AG-mCherry-rabies, we injected this virus in the vIPAG of wild- 
type animals combined with latex beads. Animals were killed 1 week later and 
the entire PAG, amygdala and hypothalamus were cut and stained for mCherry 
and NeuN. Lastly, wild-type animals were injected with AAV-flex-TVA, AAV- 
flex-rabiesG and EnvA-AG-mCherry-rabies following the same protocol used 
for monosynaptic tracing experiments. 

To characterize the connectivity from dIPAG to vIPAG, we injected AAV (2/9) 
CAG-flex-Synaptophysin-Myc™ into offspring of Vglut2-ires-Cre mice crossed 
with Gadl-eGFP mice (n=4 mice) to visualize glutamatergic dIPAG contacts 
onto GFP* GABAergic vIPAG neurons. To quantify opposing synaptic contacts, 
we selected neurons with complete soma and acquired x60 confocal images at a 
step size of 0.2 1m with a confocal microscope (Olympus FV 1000). Quantification 
of synaptic inputs from ipsilateral (n =21 cells from four mice) and contralateral 
(n= 13 cells from two mice) hemispheres was performed manually using Imaris 
software (Bitplane). 

Single-unit recordings. Custom-built, chronically implanted 16-wire electrodes** 
were used to record electrical activity in the vIPAG. Electrodes were connected to 
a headstage (Plexon) containing 16 unity-gain operational amplifiers. The head- 
stage was connected to a 16-channel preamplifier (gain 100 x bandpass filter from 
150 Hz to 9kHz for unit activity, Plexon). Spiking activity was digitized at 40 kHz, 
bandpass filtered from 250 Hz to 8 kHz, and isolated by time-amplitude window 


discrimination and template matching using a Neural Data Acquisition System 
(Omniplex, Plexon). Single-unit spike sorting was performed using Off-Line Spike 
Sorter (OFSS, Plexon) for all behavioural sessions. Principal-component scores 
were calculated for unsorted waveforms and plotted in a three-dimensional prin- 
cipal-component space; clusters containing similar valid waveforms were manually 
defined. A group of waveforms were considered to be generated from a single 
neuron if the waveforms formed a discrete, isolated, cluster in the principal-com- 
ponent space and did not contain a refractory period less than 1 ms, as assessed 
using auto-correlogram analyses. To avoid analysis of the same neuron recorded on 
different channels, we computed cross-correlation histograms. If a target neuron 
presented a peak of activity at a time that the reference neuron fired, only one of the 
two neurons was considered for further analysis. Spike timestamps were analysed 
(Neuroexplorer 5, Nex Technologies) to calculate average firing rates and z-score 
transformations of cells depending on behavioural parameters: that is, within and 
outside freezing episodes. Optical identification of single units was performed 
as previously described“. Briefly, laser light pulses of 100-300 ms duration were 
used to evoke spiking activity. Short latencies (GAD2* cells: <15 ms; vGluT2T 
cells: <10 ms) of reliable light-evoked spiking were considered to indicate direct 
light activation and, thus, allowed for identification of the cell type. To correlate 
single-unit activity with freezing behaviour, we calculated z-scores 2s before and 
after onset of freezing during the entire recall session for recordings both of uni- 
dentified or of optically identified single units. 

Statistics. The experiments were not randomized. No statistical methods were 
used to predetermine sample size. Data presented in box—whisker plots indicate 
medians, interquartile range and 5th-95th percentiles. Motion data are presented 
as s.e.m. range. All other data are presented as means + s.e.m. Statistical analyses 
were performed in Graphpad Prism 6.0a or using R. Normality was assessed 
using Shapiro-Wilk tests. Whenever the normality test failed, non-parametric 
Mann-Whitney or Wilcoxon signed-rank (for repeated measures) tests were 
used for pairwise comparisons. Within-subject group analysis of non-paramet- 
ric data was performed using Friedmann’s test with a post-hoc Dunn’s multiple 
comparisons test. Between-subject group analysis of non-parametric data was 
done with Kruskal-Wallis statistics and a post-hoc Dunn's multiple compari- 
sons test. Variance in normally distributed data sets was analysed with one-way 
ANOVA and Tukey’s or Sidak’s post-hoc tests. Significance levels are indicated 
as follows: *P < 0.05; **P < 0.01; ***P < 0.001. See Supplementary Information 
for statistics table. 


51. Franklin, K. B. J. & Paxinos, G. Atlas of the Mouse Brain 4th edn (Academic, 
2001). 
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Extended Data Figure 1 | a, Expression of ChR2 throughout vIPAG in correlation. g, Effects of light-mediated inhibition of vIPAG glutamatergic 
consecutive coronal brain sections*!. b, c, Fibre placements in Vglut2-ires- neurons on anxiety-like behaviour in the open field test with no snake 
Cre mice of experimental and control groups. d, Supplementary Video present. Inhibition of vIPAG glutamatergic cells resulted in enhanced 
stills with superimposed representative movements tracks during the track length (n =6 mice, paired two-tailed Student's t-test) and more 
snake open-field test for unconditioned defensive responses. e, Colour- frequent visits to the centre of the open field (n =6 mice, paired two-tailed 
coded plot for a mouse’s motion before, during and after light-mediated Student's t-test). h, Example of an entire snake open-field test session, with 
inhibition of vIPAG glutamatergic neurons expressing Arch during a track length, freezing episodes and mouse-snake distance. Values are 
snake open-field test session. f, Freezing responses plotted against mouse- = means +s.e.m. 
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afferents from the CEA (upper traces), and blockage of eIPSCs by PTX (scale bar, 100 1m). j, k, Leakiness analysis of the combined AAV and 
(lower traces). d, Targeting of fluorescently labelled vGluT2* neurons for EnvA-AG-mCherry-rabies tracing system in wild-type mice (scale bar, 
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Extended Data Figure 3 | a, Electrode placements for the recordings of 
unidentified single units. b, Placements of optrodes for the recordings 
of identified neurons. c, Raster-frequency plot of an optically identified 
vIPAG vGluT2* neuron. d, A glutamatergic neuron exhibiting marked 
optical activation during constant illumination for 20s. e, Identified 
vGluT2* neurons (n= 6) showed both increased and decreased activity 
during freezing. f, g, Fibre placements in Gad2-ires-Cre mice expressing 
inhibitory or excitatory optical actuators. h, i, Activation of vIPAG 
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GAD2* neurons resulted in reduced conditioned contextual freezing 
(n= 12 ChR2, n=7 control, paired two-tailed Student's t-test) and lower 
innate freezing levels during the snake open-field test (n = 10 ChR2,n=8 
control, two-tailed Wilcoxon signed-rank test). j, Optical activation of 
vIPAG GAD2°* neurons had no effect on freezing in naive mice (n=8 
ChR2, n=8 control, two-tailed Wilcoxon signed-rank test). Box—whisker 
plots indicate median, interquartile range, and 5th—95th percentiles of the 
distribution. *P < 0.05. 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | a, Projection pattern of glutamatergic vIPAG 
axonal inputs to the rostral medulla. Terminals of vVGluT2* vIPAG 
projection neurons were labelled by AAV-mediated expression of GFP 
fused to presynaptic marker synaptophysin (top; scale bar, 400 1m), and 
ChR2-mCherry expression was visualized using immunohistochemistry 
(bottom). b, Concomitant AAV-mediated expression of Syn—GFP 

in vIPAG neurons (left panel, scale bar, 200 1m) labelled presynaptic 
terminals within Mc (right panel, scale bar, 100 1m). c, High-resolution 
image of a retrogradely traced Mc pre-motor neuron (rabies-mCherry) 
and SynGFP* vIPAG inputs (top), and visualization of identified 
glutamatergic synaptic contacts (bottom; scale bar, 101m). d, e, Density of 
vGluT2* vIPAG synaptic inputs to pre-motor neurons in Mc (n=8 cells 
from three mice), and quantification of their distribution between the 
dendritic or somatic compartment. f, Intersectional EnvA-AG-mCherry-— 
rabies tracing approach to identify local GABAergic inputs to vIPAG- 
to-Mc-projecting glutamatergic cells. g, TVA-, rabiesG- and EnvA-AG- 
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mCherry-rabies triple-positive cells were identified as starter cells (left 
panels), while GAD1 and EnvA-AG-mCherry-rabies double positive cells 
indicated presynaptic GABAergic neurons (right panels; scale bar, 20,1m). 
h, Quantification of GAD1* or GAD1~ presynaptic cells (n = 2 mice; dark 
grey, rabiest/GADI1*; light grey, rabies*/GAD1/). i, Example picture of 
glutamatergic vIPAG neurons retrogradely traced from the Mc, expressing 
ChR2 in presence of Cre and Flp recombinase (scale bar, 200 1m). 

j, Analysis of viral efficacy and leakiness. Overlaps of ChR delivered by 
AAV-Cre©Flp°N-ChR2, HSV-delivered Cre-dependent Flp-mCherry 
and Cre-dependent 3-Gal were quantified in Vglut2::LacZ reporter mice. 
k, Fibre placement in Vglut2-ires-Cre mice expressing ChR2 in glutamatergic 
vIPAG-to-Mc projection neurons. 1, Example of an entire session of light 
activation of glutamatergic vIPAG-to-Mc projection neurons, with the 
mouse’s motion, cumulative track length and light-induced freezing bouts. 
Values are means + s.e.m. 
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Extended Data Figure 5 | a, Expression of ChR2 throughout PAG mice for each vIPAG and dl/IPAG) and GAD2* (n=4 mice) neurons 
in consecutive coronal brain sections (left), and fibre placements in within vIPAG or dl/IPAG. While CEA preferentially targets GAD2T 
Vglut2-ires-Cre mice of experimental and control groups (right). b, Light- neurons of the vIPAG (1 x 3 ANOVA, Fo2,6) = 21.67, P< 0.01, Tukey’s 
evoked effect on freezing behaviour induced by activation of different post-hoc test), vGluT2* neurons of the dl/IPAG receive stronger inputs 
glutamatergic subpopulations of PAG neurons in naive animals (n = 12 from PMD (1 x 3 ANOVA, F(2,7) = 287, P< 0.0001, Tukey’s post-hoc test). 
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Extended Data Figure 6 | Schematic representation of the freezing 
pathway. PN, projection neuron; IN, interneuron; MN, motor neuron. 
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Acetate mediates a microbiome-brain-6- 
cell axis to promote metabolic syndrome 


Rachel J. Perry!, Liang Peng!, Natasha A. Barry, Gary W. Cline!, Dongyan Zhang’, Rebecca L. Cardonel, Kitt Falk Petersen!, 
Richard G. Kibbey!®, Andrew L. Goodman?* & Gerald I. Shulman!*+5.° 


Obesity, insulin resistance and the metabolic syndrome are associated with changes to the gut microbiota; however, 
the mechanism by which modifications to the gut microbiota might lead to these conditions is unknown. Here we show 
that increased production of acetate by an altered gut microbiota in rodents leads to activation of the parasympathetic 
nervous system, which, in turn, promotes increased glucose-stimulated insulin secretion, increased ghrelin secretion, 
hyperphagia, obesity and related sequelae. Together, these findings identify increased acetate production resulting from a 
nutrient-gut microbiota interaction and subsequent parasympathetic activation as possible therapeutic targets for obesity. 


Previous studies have shown that both increases!~* and decreases®” in 


plasma and faecal short-chain fatty acid (SCFA) concentrations can 
be associated with overfeeding, obesity and the metabolic syndrome. 
However, whether and how alterations in SCFAs play a causal role in 
the development of obesity is unknown. Because plasma SCFA con- 
centrations may not fully represent the SCFA load presented to the 
body, we developed a method to measure whole-body turnover rates of 
acetate, propionate, and butyrate by gas chromatography—mass spec- 
trometry (GC-MS; as described in the Supplementary Methods) and 
found that, in contrast to propionate and butyrate, whole-body ace- 
tate turnover as well as plasma and faecal acetate concentrations were 
markedly increased in insulin-resistant rats after 3 days or 4 weeks on 
a high-fat diet (HFD) (Fig. 1a, b and Extended Data Fig. 2a-j). 

Next we sought to determine the source of the increased acetate turn- 
over in HFD-fed rats. We measured tissue acetate concentrations and 
dilution of the °C-acetate label during an infusion of [1C]acetate, and 
found each to be increased in the luminal contents of the caecum and 
ascending colon, with HFD-fed rats exhibiting more than a twofold 
increase in total acetate in the caecum and colon as well as in the brain 
and in [!C] bicarbonate incorporation into ['%C]acetate compared to 
chow-fed rats (Figs 1c, d, 2a and Extended Data Fig. 2k). In order to 
determine conclusively the source of the increased acetate production 
in HFD-fed rats, we conducted four independent in vivo experiments to 
distinguish colon lumen acetate production from production by the rest 
of the body: we 1) washed out the contents of the caecum and colon with 
a saline flush; 2) ligated the portal vein below the splenic juncture; 3) 
treated rats with poorly absorbed broad-spectrum oral antibiotics; and 4) 
performed an acute colectomy. Each of these interventions reduced 
whole-body acetate turnover by 75-90% (Fig. 2b-f). Together, these 
data strongly suggest that the gut microbiota is the source of most of 
the increase in endogenous acetate production in HFD-fed rats. We 
next showed that faecal material can generate acetate in vitro from [Bc] 
glucose or [!?C] fatty acids, and that boiling or irradiating the faeces 
prevents the production of acetate, suggesting a role for faecal microbes 
in generating acetate (Extended Data Fig. 2l-n). Consistent with this 
hypothesis, treatment of the faecal material with either or both of the 
broad-spectrum antibiotics vancomycin and gentamycin markedly 
reduced acetate production (Extended Data Fig. 20). 


Acetate drives insulin secretion 

Next we examined glucose-stimulated insulin secretion (GSIS) during 
a hyperglycaemic clamp and measured marked increases in GSIS in 
3-day and 4-week HFD-fed rats (Fig. 3a and Extended Data Fig. 3a—c). 
To determine whether the associated increases in acetate turnover 
drove this increased GSIS, we performed hyperglycaemic clamps in 
chow-fed rats given intra-arterial infusions of acetate to match whole- 
body acetate turnover to that measured in HFD-fed rats. Acetate 
infusion in chow-fed rats replicated the increases in GSIS measured in 
HFD-fed rats (Fig. 3b, cand Extended Data Fig. 3d—g), strongly impli- 
cating increased acetate turnover in driving acute increases in GSIS in 
HFD-fed rodents. In contrast, supplementing butyrate in chow-fed rats 
to match the turnover rates observed in HFD-fed rats had no effect on 
GSIS (Extended Data Fig. 3h-m). 

To evaluate further the effects of alterations in food intake on gut 
acetate production, we starved 4-week HFD-fed rats for 48 h and found 
that this intervention resulted in ~50% reductions in whole-body 
acetate turnover and in GSIS; however, replacing acetate by arterial 
infusion of 20jmolkg~' min“ acetate resulted in restoration of GSIS 
in rats after 48 h food deprivation (Extended Data Fig. 4a—f). Next we 
performed a series of dietary interventions to assess whether simple 
caloric excess or variations in nutrient composition®'° were respon- 
sible for the increased acetate turnover measured in HFD-fed rats. 
Pair-feeding with isocaloric portions of chow or HED produced no 
change in acetate turnover or GSIS, whereas dietary interventions 
resulting in increased caloric intake increased acetate turnover and 
GSIS proportionally to the total calories consumed (Extended Data 
Fig. 4g—n; R?=0.90). To examine the role of the gut microbiota in 
acetate-induced hyperinsulinaemia, we treated HFD-fed rats with 
broad-spectrum, poorly absorbable oral antibiotics and measured a 
70% reduction in GSIS during a hyperglycaemic clamp. This reduction 
in GSIS was acutely reversed by infusion of acetate to match plasma 
acetate turnover in HFD-fed rats (Fig. 3d, e and Extended Data 
Fig. 40-s). 

To establish a causal relationship between the microbiota and 
GSIS, we next transferred faecal material from chow- or HFD-fed 
donor rats to chow- or HED-fed recipients. Consistent with previous 
reports*!'-4, culture-independent 16S rRNA sequencing of donor 
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Haven, Connecticut 06510, USA. 3Microbial Sciences Institute, Yale University School of Medicine, New Haven, Connecticut 06516, USA. “Howard Hughes Medical Institute, Yale University School 
of Medicine, New Haven, Connecticut 06519, USA. Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen 2200, Denmark. *Department of 
Cellular & Molecular Physiology, Yale University School of Medicine, New Haven, Connecticut 06510, USA. 
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Figure 1 | HFD-fed rats exhibit increased whole-body acetate turnover. 
a, b, Plasma acetate concentrations and whole-body acetate turnover in 
chow-fed and 3-day or 4-week HFD-fed rats. c, Acetate content in the 
entire caecum and colon lumen. d, Tissue acetate concentrations. In all 
panels, *P< 0.05, **P< 0.01, ***P< 0.001, ****P < 0.0001 versus 
chow-fed rats; §§$P < 0.001, $§§§P < 0.0001 versus 3-day HFD-fed rats 
by one-way ANOVA with Bonferroni’s multiple comparisons test (a, b) 

or by two-tailed unpaired Student's t-test (c, d). Data are the mean + s.e.m. 
of n= 6 animals per group. 


and recipient faecal microbiomes revealed an increase in the rela- 
tive abundance of bacteria belonging to the phylum Firmicutes and 
a decrease in the relative abundance of representatives of the phylum 
Bacteroidetes in fresh faecal pellets from HFD-fed donors relative 
to chow-fed donors, and faecal transplantation altered the recipient 
animal microbiome to resemble that of the donor (Extended Data 
Fig. 5a-f). Notably, these faecal transplantations also transferred the 
corresponding acetate turnover, faecal acetate, and GSIS from the 
donor group to the recipient group (Fig. 3f, g and Extended Data 
Fig. 6a—e). However, transplantation of microbiota from chow-fed 
donors into chow-fed recipients by an identical procedure did not 
alter microbiota or metabolic phenotypes (Fig. 3f, g and Extended 
Data Figs 5a-f, 6a-e). 

Having found a strong causal relationship between acetate turnover 
and GSIS, we next examined the mechanism by which increased 
acetate turnover caused increased GSIS. We first investigated whether 
acetate could stimulate GSIS through a direct effect on B-cells, perhaps 
by increasing acetylcarnitine concentrations''!°. However, we found 
that neither acetate nor acetylcarnitine stimulated GSIS in isolated islet 
perifusions, ruling out a direct effect on B-cells (Fig. 3h and Extended 
Data Fig. 6f-h). In addition, concentrations of 8-cell stimulatory 
amino acids and plasma glucagon were unchanged or reduced in the 
acetate-infused rats (Extended Data Fig. 6i-1). A small (about 2 pM) 
but significant (P< 0.05) increase in plasma glucagon-like peptide-1 
(GLP-1) concentration was measured in rats after 120 min of ace- 
tate infusion (Extended Data Fig. 6m). Because GLP-1 can stimulate 
GSIS!*!3, we treated acetate-infused rats with a GLP-1 inhibitor; this 
treatment produced no change in GSIS (Extended Data Fig. 6n-s), 
demonstrating that these small changes in GLP-1 were not responsible 
for the increased GSIS in acetate-infused rats. 


Acetate drives GSIS via parasympathetic input 

As parasympathetic input is a well-known stimulator of 8-cell insu- 
lin secretion4, we next measured plasma gastrin concentrations as a 
marker of parasympathetic activity in rats acutely infused with ace- 
tate. Plasma gastrin increased threefold after 60 min of infusion with 
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Figure 2 | The contents of the colonic lumen are the primary source of 
acetate in HFD-fed rats. a, Tissue and plasma [!C]acetate enrichment 

in rats infused with [!C]acetate. b, Whole-body acetate turnover before 
and after washout of the gut. **P < 0.01, ***P < 0.001 versus before 
washout. c, Whole-body acetate turnover in HFD-fed rats before and 

after portal vein ligation. d, e, Whole-body acetate turnover, and acetate 
in the entire caecum and colon lumen. f, Whole-body acetate turnover in 
HFD-fed rats before and after acute colectomy. In all panels, ***P < 0.001, 
**** P < 0.0001 by two-tailed unpaired Student's t-test. Data are the 

mean +s.e.m. of n= 6 replicates per group. 


20,molkg~!min™ acetate (Fig. 4a). Increases in brain acetate concen- 
trations in the acetate-infused animals confirmed the ability of acetate 
infused systemically to enter the brain circulation (Fig. 4b). Because 
vagal stimulation has been shown to drive, and vagotomy has been 
shown to suppress, basal and glucose-stimulated insulin secretion!®7°, 
we hypothesized that vagotomy would reduce GSIS in acetate-infused 
rats. Consistent with this hypothesis, vagotomized rats infused with ace- 
tate exhibited an approximately fourfold reduction in plasma insulin 
concentrations throughout a hyperglycaemic clamp without any change 
in plasma glucagon concentrations, when compared with intact rats 
infused with acetate (Fig. 4c and Extended Data Fig. 7a—h). In addition, 
treatment with the parasympathetic blocker atropine before the ace- 
tate infusion abolished the ability of acetate to stimulate GSIS, without 
any effect on plasma glucagon concentrations (Fig. 4d and Extended 
Data Fig. 7i-n), replicating prior studies demonstrating that atropine 
can suppress basal and glucose-stimulated insulin secretion indirectly 
in vitro and in vivo!?'~*°, To test whether the effect of parasympathetic 
stimulation of GSIS is centrally mediated, we administered acetate by 
intracerebroventricular (ICV) injection at a dose chosen to increase 
cerebrospinal fluid acetate concentrations by 200 1M, mimicking the 
increases in plasma acetate concentrations caused by intra-arterial infu- 
sion of 20j.molkg~! min“ acetate. ICV acetate tripled GSIS during a 
hyperglycaemic clamp without inducing any difference in systemic ace- 
tate concentrations; however, this effect was blocked by treatment with 
atropine, and was independent of changes in plasma glucagon concen- 
trations (Fig. 4e, fand Extended Data Fig. 8a—d), suggesting that acetate 
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Figure 3 | Acetate turnover drives GSIS. a, Plasma insulin in a 
hyperglycaemic clamp. *P < 0.05, **P < 0.01, ***P < 0.001 versus chow- 
fed rats; §P < 0.05 versus 3-day HFD-fed rats. b, c, Acetate turnover and 
GSIS in rats given acute acetate. *P < 0.05, **P < 0.01, ****P < 0.0001 
versus 2 umol kg~! min~!; $P < 0.05, §$P < 0.01, $§$§P < 0.0001 versus 
8ymolkg~'min~!. d, e, Acetate turnover and GSIS in rats treated with 
broad-spectrum, non-absorbable oral antibiotics. *P < 0.05, **P< 0.01, 
#3 D < (),0001 versus controls; §P < 0.05, §§P < 0.01, §$§§P < 0.001, 


acts centrally to increase GSIS. Because atropine has also been shown to 
act directly on B-cells to suppress insulin secretion, we infused rats with 
intra-arterial acetate and examined the effect of ICV methylatropine, 


§$$§P < 0.0001 versus antibiotic-treated rats. f, g, Whole-body acetate 
turnover and GSIS. *P < 0.05, **P< 0.01, ***P < 0.001, ****P < 0.0001 
versus chow-fed donor and chow-fed recipient; §$P < 0.01, §§$P < 0.001, 
§$$§P < 0.0001 versus chow-fed donor and HFD-fed recipient. 

h, GSIS in isolated islets (KRB buffer; n = 4 replicates per group). Data 
show mean = s.e.m. Groups were compared by one-way ANOVA with 
Bonferroni’s multiple comparisons test (a-g) or by two-tailed unpaired 
Student’s t-test (h). Unless otherwise specified, n =6 rats per group. 


an atropine analogue that does not cross the blood-brain barrier. 
Consistent with acetate driving GSIS via the parasympathetic nervous 
system, methylatropine fully abrogated the ability of acetate to drive 
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Figure 4 | Acetate drives increased GSIS via parasympathetic activation. 
a, Plasma gastrin. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001 
versus 2 pmol kg~! min™! acetate; §P < 0.05 versus 8 jsmolkg~! min~! 
acetate. b, Tissue acetate. c, d, GSIS. e, f, Plasma acetate and GSIS. 

*P< 0.05, **P< 0.01, ***P< 0.001 versus controls; §$P < 0.05, §§P < 0.01, 
§§§P < 0.001 versus ICV acetate. g, h, Plasma gastrin (120 min) and GSIS. 
*P< 0.05, **P< 0.01, ***P< 0.001, ****P < 0.0001 versus controls; 
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§P< 0.05, $§P < 0.01, S§§P < 0.001, §SS§P < 0.0001 versus acetate. 

i, j, Plasma gastrin (120 min) and GSIS following acetate injection into 

the nucleus tractus solitarius. Data are the mean +s.e.m. of n= 6 animals 
per group, with groups compared by ANOVA with Bonferroni’s multiple 
comparisons test (a, e-h) or by two-tailed unpaired Student's f-test (b-d, i, j). 
In b-d, i, and j, *P << 0.05, **P< 0.01, ***P< 0.001, ****P < 0.0001. 
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Figure 5 | Chronic increases in whole-body acetate turnover promote 
hyperphagia, obesity, and metabolic syndrome. a, Plasma insulin during 
a hyperglycaemic clamp. b, c, Plasma gastrin and ghrelin at time 0 of 

the hyperglycaemic clamp. d, Weight change during the 10-day infusion 
(n= 16 controls, 16 acetate, and 12 acetate plus vagotomy). e, f, Liver and 
skeletal muscle triglyceride content. g, Endogenous glucose production 
during a hyperinsulinaemic-euglycaemic clamp. h, Glucose disposal 

rate during the clamp. All data are mean + s.e.m. In all panels, *P < 0.05, 
**P < 0.01, ***P < 0.001, ****P < 0.0001 versus controls; §P < 0.05, 

§§P < 0.01, $$§P < 0.001, $§§$P < 0.0001 versus acetate-treated rats by 
one-way ANOVA with Bonferroni’s multiple comparisons test. n= 6 
replicates unless otherwise stated. 


GSIS (Fig. 4g, h and Extended Data Fig. 8e-i). To confirm that activation 
of the parasympathetic nervous system drives this effect, we injected 
the same dose of acetate into the nucleus tractus solitarius, and found 
that this intervention replicated the effects of systemic and ICV acetate 
on GSIS by driving parasympathetic outflow as indicated by a tenfold 
increase in plasma gastrin concentrations without any change in sys- 
temic plasma acetate or glucagon (Fig. 4i, j and Extended Data Fig. 8j-o). 
Together, these data conclusively demonstrate that the acetate-induced 
increase in GSIS occurs through parasympathetic activation. 
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Chronic increases in acetate drive obesity 
We next investigated whether a chronic increase in acetate turnover 
would promote chronic hyperinsulinaemia, hyperphagia, and weight 
gain and the associated sequelae of obesity. To answer this question, 
we performed continuous intragastric acetate infusions for 10 days, 
treating chow-fed rats with 20 umolkg~! min! acetate to mimic the 
increase in gut microbial acetate production measured in HFD-fed 
rats (Extended Data Fig. 9a, b). Rats that received chronic intragastric 
acetate infusions exhibited increased insulin secretion during both a 
hyperglycaemic clamp and an intraperitoneal glucose tolerance test; this 
increase in insulin secretion was associated with a fivefold increase in 
plasma gastrin concentration (Fig. 5a, b and Extended Data Fig. 9c-h). 
All of these effects were prevented by vagotomy. Consistent with the 
hypothesis that chronic postprandial hyperinsulinaemia leads to 
increased weight gain, rats that received chronic intragastric acetate 
infusions exhibited more than a doubling in daily caloric intake and 
in weight gain over the ten-day infusion, which may be attributable, at 
least in part, to a threefold increase in plasma ghrelin concentrations 
(Fig. 5c, d and Extended Data Fig. 9i, j). These effects were also pre- 
vented by vagotomy, demonstrating that parasympathetic activation is 
necessary to mediate the effects of chronic acetate on GSIS in awake, 
unrestrained rats. Finally, acetate-infused rats exhibited insulin resist- 
ance, as indicated by impaired glucose disposal and impaired insulin 
suppression of hepatic glucose production during a hyperinsulinae- 
mic-euglycaemic clamp, and increases in plasma, liver, and skeletal 
muscle triglyceride content without any changes in plasma glucagon 
concentrations (Fig. 5e—h and Extended Data Fig. 9k-p). Vagotomized 
rats exhibited none of these consequences of acetate infusion. 
Together, these findings strongly suggest that the gut microbiota are 
responsible for generating increased acetate turnover and driving obe- 
sity in HFD-fed rats, although we cannot rule out the possibility that 
the microbiota also modulate acetate absorption”. To conclusively test 
the hypothesis that the gut microbiota are primarily responsible for 
increasing acetate turnover in HFD-fed rodents, we measured plasma 
and colonic acetate content in germ-free mice lacking gut microbes and 
ex-germ-free mice 4 weeks after colonization with normal mouse faeces 
(conventionalized; CONV-D) fed either a regular chow diet or HFD. 
Demonstrating the role of the gut microbiota as the main producer of 
acetate in vivo, germ-free mice had negligible plasma, colonic lumen, 
and tissue acetate concentrations as compared to CONV-D mice; only 
conventionalized mice exhibited an increase in acetate concentrations 
on HED (Fig. 6a-c). Germ-free mice fed [°C] bicarbonate also exhib- 
ited strikingly lower plasma and tissue ‘°C enrichment than CONV-D 
mice. Furthermore, in CONV-D mice but not germ-free mice, [BC] 
acetate was doubled in HFD-fed relative to chow-fed animals (Fig. 6d). 
As rodents do not possess the enzymes necessary to convert bicarbonate 
to acetate, these results demonstrate that the gut microbiota are respon- 
sible for the increased acetate turnover in HFD-fed animals, a conclu- 
sion corroborated by the fact that colonic [3C]acetate enrichment in 
CONV-D mice was more than double the enrichment in plasma or in 
any other tissue (Extended Data Fig. 10a). In contrast, propionate and 
butyrate concentration and enrichment were minimal in plasma and tis- 
sues in all mice (Extended Data Fig. 10b-g). Finally, because of the role 
of increased acetate in promoting parasympathetic activation, we meas- 
ured plasma gastrin and ghrelin concentrations in the germ-free and 
CONV-D mice, and found that CONV-D mice exhibited two- and ten- 
fold increases in gastrin and ghrelin, respectively, which were associated 
with two- and fivefold increases in liver and skeletal muscle triglyceride 
content (Fig. 6e-h), compared with the germ-free mice. Together, these 
data clearly implicate the gut microbiota as being responsible for the 
majority of the whole-body plasma acetate turnover in vivo and for 
the increase in acetate turnover observed in HFD-fed rats. 


Conclusions 
In summary, we show here that increased acetate production due 
to a gut microbiota—nutrient interaction in HFD-fed rodents leads 
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Figure 6 | Gut bacteria are responsible for the majority of acetate 
production in vivo, and for the increase in HFD-fed rodents. 

a, b, Plasma and caecal/colon lumen acetate concentrations 

in germ-free (GF) and CONV-D mice. c, Tissue acetate 
concentrations. d, Plasma ['*C]acetate enrichment in mice 

fed water containing ['*C]bicarbonate for 3 days. 

e, f, Plasma gastrin and ghrelin concentrations. 

g, h, Liver and skeletal muscle triglyceride content. 

In all panels, *P < 0.05 and ****P < 0.0001. Data are the 

mean + s.e.m. of n= 9 GF mice and 10 CONV-D mice per diet. 


to activation of the parasympathetic nervous system and results 
in increased ghrelin secretion and GSIS. This generates a positive 
feedback loop, resulting in hyperphagia, hypertriglyceridaemia, 
ectopic lipid deposition in liver and skeletal muscle, and liver and 
muscle insulin resistance (Extended Data Fig. 1). The increased 
acetate production that occurs when the gut microbiota are 
exposed to calorically dense nutrients may mediate an important 
positive feedback loop between the gut microbiota and the CNS 
that promotes hyperphagia (due to increased ghrelin secretion) and 
increased energy storage as fat (due to increased GSIS) in foraging 
animals when they stumble across calorically dense foodstuffs in the 
wilderness. However, in the setting of chronic exposure to calorically 
dense, abundant food, this gut microbiota—brain-8-cell axis promotes 
obesity and its related sequelae of hyperlipidaemia, fatty liver disease 
and insulin resistance. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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Extended Data Figure 1 | Mechanism by which a diet-microbiota interaction drives obesity and the metabolic syndrome. 


© 2016 Macmillan Publishers Limited. All rights reserved 


re) 
oO 


Plasma TAG 
(mg di) 
- np oo b a 
i=) i=) i=) i=) i=} i=} 
| * 
HOMA-IR 
° a 3 a 
a t 
Dietary acetate 
(umol g™') 
=> Ls) wo S 
e e 


Chow 3-day HFD 4-week HFD Chow 


ok 
@ 


Fecal acetate 

(umol per g dry weight) 
> nN wo c= 
Oo So oO oO 

= * 

Plasma propionate 
(uM) 

N b oa 
i=} i=) i=) i=) 


Chow 3-day HFD 4-week HFD Chow 


Co] 
> 


Plasma butyrate 
(uM) 
= iN) w 
So ro) ro) 6 
Whole-body butyrate tumover 
(umol kg”! mins’) 


ND 


2 


Fecal propionate 
(umol per g dry weight) 


Chow 3-day HFD 4-week HFD Chow 
[ k 
J 2 8 
5 wu 
ge ' 2 ° 
gS 2 
go 4 ge 4 
a5 8 
25 & 2 
= 
— 
0: 
Chow 3-day HFD 4-week HFD Chow 


3 
s 


100 


~ Control 


[U-*8C] Glucose —_[U-18C] Palmitate 


Extended Data Figure 2 | HFD-fed rats exhibit increased gut acetate 
production. a, Plasma triglycerides. b, HOMA-IR. c, Dietary acetate 
concentrations. n = 2 replicates per diet. d, Faecal acetate normalized to 
dry weight. e-g, Plasma propionate, whole-body propionate turnover, 

and faecal propionate concentrations. h-j, Plasma butyrate, whole-body 
butyrate turnover, and faecal butyrate concentrations. k, {'*C]acetate 
enrichment in plasma of rats fed [!*C]bicarbonate-labelled food and water. 
1, [U-!3C]acetate from faeces incubated in [U-'°C]glucose or fatty acids. 
m, In vitro acetate production rate from faeces incubated in [U-¥C] 
glucose or fatty acids. n, In vitro acetate production rate in control, boiled, 


3-day HFD 4-week HFD 


3-day HFD 4-week HFD 


3-day HFD 4-week HFD 


[U-"3C] acetate APE 
(%) 
nD a NN 
i=) a i=) a 
Acetate production rate 
(nmol g* mins") 
i=) ND cS for) 


ARTICLE 


2) 


Chow HFD 


=h 


Whole-body propionate turnover 
(umol kg™! min-') 


Chow 3-day HFD 4-week HFD 


Chow 


3-day HFD 4-week HFD 


2 
rs 10: 
= 
Bo 8 
Ste 
SE 6 
ay 
2o 
£oO 4 
BE 
a=} 
(3) 

8 

=} 0: 


thee 


4-week HFD 


[U-'3C] Glucose —_[U-'8C] Palmitate 


12 


Boiled Irradiated 


Acetate production rate eo) 
(nmol g*t min“) 
i=) wo oa o a 
> 
% 
2) 
% 


> 
Ro = ~\ fe >. 
s S S) S) re 
FH SF SF 
~? eS RNG 
ms 
s* 
S 
eS 


and UV-irradiated faecal samples. ****P < 0.0001 versus control. 

0, In vitro faecal acetate production following treatment with antibiotics. 
Unless otherwise specified, *P < 0.05, **P< 0.01, ***P < 0.001, 
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two-tailed unpaired Student’s t-test (n). Unless otherwise specified, n =6 
replicates per group. 
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Extended Data Figure 3 | HFD-fed rats exhibit increased GSIS hyperglycaemic clamp. g, Plasma insulin AUC during the clamp. 
driven by increased acetate turnover. a, b, Plasma glucose and glucose h, i, Plasma butyrate and whole-body butyrate turnover. *P < 0.05, 
infusion rate during a hyperglycaemic clamp. *P < 0.05, **P <0.01, ****P < 0.0001. j, k, Plasma glucose and glucose infusion rate during a 
#2 D < 0.001, ****P < 0.0001 versus chow-fed rats. c, Plasma insulin hyperglycaemic clamp. 1, m, Plasma insulin concentrations during the 
area under the curve (AUC) during the hyperglycaemic clamp. d, Plasma hyperglycaemic clamp, and plasma insulin AUC. In all panels, data are the 
acetate. *P< 0.05, **P < 0.01, *** P< 0.001, ****P < 0.0001 versus mean + s.e.m. of m =6 animals per group, with comparisons by one-way 


2umolkg~! min! acetate; $§$§P < 0.0001 versus 8 tmolkg~! min™ 


! ANOVA with Bonferroni’s multiple comparisons test (a-g) or two-tailed 


acetate. e, f, Plasma glucose and glucose infusion rate during a unpaired Student’s t-test (h-m). 
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Extended Data Figure 4 | Increasing total caloric intake leads to 
increased acetate turnover and GSIS via the microbiota in rats. 

a, b, Plasma acetate and whole-body acetate turnover. c, d, Plasma glucose 
and glucose infusion rate during a hyperglycaemic clamp. e, f, Plasma 
insulin and insulin AUC during the clamp. g, Caloric intake from protein, 
fat, and carbohydrate. In g-m, each group was compared to pair-fed, 
high-carbohydrate-fed rats. h, i, Plasma glucose and glucose infusion rate 
in the hyperglycaemic clamp. j, k, Plasma acetate and whole-body acetate 
turnover. 1, m, Plasma insulin and insulin AUC during the hyperglycaemic 
clamp. n, Linear regression: whole-body acetate turnover versus total 
caloric intake in each diet group. 0, p, Plasma glucose and glucose infusion 
rate during a hyperglycaemic clamp in 4-week HFD-fed rats treated with 
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broad-spectrum non-absorbable antibiotics. q, Plasma acetate. r, Plasma 
['8C]acetate enrichment following three days of feeding ['*C] bicarbonate 
food and water. Data were compared using the two-tailed unpaired 
Student’s t-test. s, Insulin AUC during a hyperglycaemic clamp. In all 
panels, data are the mean +s.e.m. of n=6 rats per group, with groups 
compared by one-way ANOVA with Bonferroni’s multiple comparisons 
test, unless otherwise stated. In a-f, *P< 0.05, **P<0.01, ***P< 0.001, 
#2 D < (),0001 versus 12-h starved rats; §P < 0.05, §§P < 0.01, 

§$§P < 0.001, §$§§$P < 0.0001 versus 48-h starved rats. In h-m, *P < 0.05, 
**P < 0.01, ***P < 0.001, ****P < 0.0001 versus pair-fed rats given the 
high-carbohydrate diet. In o-s, ***P < 0.001, ****P < 0.0001 versus 
HED-fed rats; §$§P < 0.001, $§$§P < 0.0001 versus antibiotics-treated rats. 
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Extended Data Figure 5 | Faecal transplantation alters recipient 
microbiomes to resemble their donors as revealed by culture- 
independent 16S rRNA sequencing of faecal microbiomes from donors 
and recipients. a, Relative abundance at the phylum level. Only phyla 
with relative abundance > 0.1% in at least one group are shown. Data are 
the mean +s.e.m. of n= 7-8 replicates per group; *P < 0.05 by 2-tailed 
unpaired Student's t-test. b-f, Beta diversity analysis of faecal microbiomes 


before and after transplantation. The largest component of variation 
(captured by principal coordinate (PC)1) is shown in b and PC1-PC3 are 
shown in c-f. Rats from independent litters were randomized before diet 
administration or faecal transplantation. Beta diversity reflects principal 
coordinates analysis based on Hellinger distances; the results from 
unweighted, non-phylogenetic distance metrics and from phylogenetic 
metrics (weighted and unweighted UniFrac) are similar. 
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Extended Data Figure 6 | The gut microbiota drive increased acetate 
turnover and GSIS. a, b, Plasma glucose and glucose infusion rate during 
a hyperglycaemic clamp in rats following faecal transplant replicates 
acetate turnover and GSIS in the donor group. c, Plasma acetate. d, Faecal 
acetate concentration. n= 7 (HED to chow) or 8 (chow to chow, chow to 
HED) per group. e, Plasma insulin AUC. f, Glucose-stimulated insulin 
release in isolated islets incubated with 400 \1M acetate in a physiological 
buffer. n = 4 per group. g, Plasma C2 acetylcarnitine content. h, Glucose- 
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stimulated insulin release in isolated islets incubated with 100 11M 
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during a hyperglycaemic clamp in acetate-infused rats treated with a 
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multiple comparisons test. In g-s, data are the mean +s.e.m. of n=6 
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Extended Data Figure 7 | Acetate drives GSIS via a CNS mechanism. 

a, Body weight before and after vagotomy. b, c, Plasma glucose and glucose 
infusion rate during a hyperglycaemic clamp. d, e, Plasma acetate and 
whole-body acetate turnover. f, Insulin AUC during the clamp. g, Plasma 
gastrin during the clamp. h, Plasma glucagon after 120 min of the clamp. 

h, i, Plasma glucose and glucose infusion rate during a hyperglycaemic 
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clamp in acetate-infused, atropine-treated rats. k, 1, Plasma acetate and 
whole-body acetate turnover. m, Plasma insulin AUC during the clamp. 
n, Plasma glucagon. In all panels, *P < 0.05, **P < 0.01, ***P < 0.001, 
**E P < (0001 by the two-tailed unpaired Student's t-test; data represent 
the mean +s.e.m. of n =6 replicates per group. 
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Extended Data Figure 8 | Acetate drives GSIS via parasympathetic 
activation. a, b, Plasma glucose and glucose infusion rate during a 
hyperglycaemic clamp in rats treated with ICV acetate. c, Plasma insulin 
AUC. d, Plasma glucagon. e, f, Plasma acetate and whole-body acetate 
turnover in rats treated with systemic intra-arterial acetate and ICV 
methylatropine. g, h, Plasma glucose and glucose infusion rate during a 
hyperglycaemic clamp. i, Plasma insulin AUC during the clamp. 

j, k, Plasma and brain tissue acetate in rats given an injection of acetate 
into the nucleus tractus solitarius. 1, m, Plasma glucose and glucose 


infusion rate during a hyperglycaemic clamp. n, Plasma insulin AUC 
during the clamp. o, Plasma glucagon. In all panels, data are the 

mean + s.e.m. of n= 6 animals per group, with comparisons by one-way 
ANOVA with Bonferroni’s multiple comparisons test (a-i) or two- 
tailed unpaired Student's t-test (j-0). In b-d, **P < 0.01, ***P < 0.001 
versus controls; §§P < 0.01, $§$P < 0.001 versus ICV acetate-treated 
rats by one-way ANOVA with Bonferroni's multiple comparisons test. 
In e-i, ***P < 0.001, ****P < 0.0001 versus controls; §§§P < 0.001, 
§§§§P < 0.0001 versus acetate-infused rats. 
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Extended Data Figure 9 | Chronic intragastric acetate infusion causes 
hyperphagia and metabolic syndrome through parasympathetic 
activation. a, b, Plasma acetate and whole-body acetate turnover. 

c, d, Plasma glucose and insulin concentrations during an intraperitoneal 
glucose tolerance test. e, Insulin AUC during the glucose tolerance test. 

f, g, Plasma glucose and glucose infusion rate during a hyperglycaemic 
clamp. h, Insulin AUC during the hyperglycaemic clamp. i, Body weight 
before and after the infusion study (n = 16 controls, 16 acetate-infused, 
and 12 acetate-infused and vagotomised rats). j, Caloric intake during 
the 10-day acetate infusion study. k, Homeostatic model assessment of 
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insulin resistance (HOMA-IR). 1, Plasma triglyceride concentrations. 

m, Plasma insulin at the 120-min time point of a hyperinsulinaemic— 
euglycaemic clamp. n, 0, Plasma glucose and glucose infusion rate during 
the hyperinsulinaemic-euglycaemic clamp. p, Plasma glucagon. Unless 
otherwise specified, data are mean +s.e.m. of n =6 rats per group, with 
comparisons by one-way ANOVA with Bonferroni’s multiple comparisons 
test. In all panels, *P < 0.05, **P < 0.01, ***P < 0.001, ****P< 0.0001 
versus controls; §P < 0.05, §§P < 0.01, $§§P < 0.001, $§§$§P < 0.0001 versus 
intragastric acetate-infused rats. 
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Extended Data Figure 10 | Germ-free mice have negligible endogenous 
short-chain fatty acid production. a, Ratio of tissue:plasma ['3C] 

acetate in mice fed [!°C]bicarbonate. b, c, Plasma and tissue propionate 
concentrations. d, Plasma [!°C] propionate enrichment. e, f, Plasma and 


tissue butyrate. g, Plasma [!?C] butyrate enrichment. 
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Cold, clumpy accretion onto an active supermassive 


black hole 
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Supermassive black holes in galaxy centres can grow by the 
accretion of gas, liberating energy that might regulate star 
formation on galaxy-wide scales'~*. The nature of the gaseous fuel 
reservoirs that power black hole growth is nevertheless largely 
unconstrained by observations, and is instead routinely simplified 
as a smooth, spherical inflow of very hot gas*. Recent theory®’ and 
simulations* !° instead predict that accretion can be dominated by 
a stochastic, clumpy distribution of very cold molecular clouds—a 
departure from the ‘hot mode’ accretion model—although 
unambiguous observational support for this prediction remains 
elusive. Here we report observations that reveal a cold, clumpy 
accretion flow towards a supermassive black hole fuel reservoir 
in the nucleus of the Abell 2597 Brightest Cluster Galaxy (BCG), 
a nearby (redshift z= 0.0821) giant elliptical galaxy surrounded 
by a dense halo of hot plasma!'-'’. Under the right conditions, 
thermal instabilities produce a rain of cold clouds that fall towards 
the galaxy’s centre!*, sustaining star formation amid a kiloparsec- 
scale molecular nebula that is found at its core!*. The observations 
show that these cold clouds also fuel black hole accretion, revealing 
‘shadows cast by the molecular clouds as they move inward at about 
300 kilometres per second towards the active supermassive black 
hole, which serves as a bright backlight. Corroborating evidence 
from prior observations’* of warmer atomic gas at extremely high 
spatial resolution’’, along with simple arguments based on geometry 
and probability, indicate that these clouds are within the innermost 
hundred parsecs of the black hole, and falling closer towards it. 

We observed the Abell 2597 BCG (Fig. 1) with the Atacama Large 
Millimeter/submillimeter Array (ALMA), enabling us to create a 
three-dimensional map of both the location and motions of cold gas at 
uniquely high sensitivity and spatial resolution. The ALMA receivers 
were sensitive to emission from the J =2-1 rotational line of the carbon 
monoxide (CO) molecule. Such CO(2-1) emission is used as a tracer of 
cold (~10—30K) molecular hydrogen, which is vastly more abundant, 
but not directly observable at these low temperatures. 

The continuum-subtracted CO(2-1) images (Fig. 2) reveal that the 
filamentary emission line nebula that spans the galaxy’s innermost 
~30kpce (Fig. 1b) consists not only of warm ionized gas'**°, but also 
cold molecular gas. In projection, the optical emission line nebula is 


co-spatial and morphologically matched with CO(2-1) emission 
detected at a significance between > 3c (in the outer filaments) and 
2200 (in the nuclear region) above the background noise level. The 
warm ionized nebula is therefore likely to have a substantial molecular 
component, consistent with results for other similar galaxies?!. The 
total measured CO(2-1) line flux corresponds to a molecular hydrogen 
gas mass of My;, = (1.80.2) x 10°M.j, where Mog is the mass of the 
Sun. The critical (minimum) density for CO(2-1) emission requires 
that the volume filling factor of this gas be very low, of the order ofa 
few per cent. The projected spatial coincidence of both the warm ion- 
ized and cold molecular nebulae therefore supports the long-envisaged 
hypothesis that the ionized gas is merely the warm ‘skin’ surrounding 
far colder and more massive molecular cores”, whose outer regions 
are heated by intense radiation from the environment in which they 
reside. Rather than a monolithic, kiloparsec-scale slab of cold gas, 
we are more likely to be observing a projected superposition of many 
smaller, isolated clouds and filaments. 

The data unambiguously show that cold molecular gas is falling 
inward along a line of sight that intersects the galaxy centre. We know 
this because the ALMA beam that is co-spatial with the millimetre 
continuum source, the radio core, and the isophotal centre of the galaxy 
reveals strong, redshifted continuum absorption (Fig. 3b), found by 
extracting the CO(2-1) spectrum from this central beam. This reveals 
at least three deep and narrow absorption lines (Fig. 3c), with redshifted 
line centres at +240, +275, and +335kms! relative to the systemic 
(stellar) velocity of the galaxy, all within an angular (physical) region 
of 0.715” x 0.533” (1kpc x 0.8kpc). 

These absorption features arise from cold molecular clouds moving 
towards the centre of the galaxy, via either radial or inspiralling tra- 
jectories. They manifest as continuum absorption because they cast 
‘shadows along the line of sight as the clouds eclipse or attenuate about 
~20% (or about 2 mJy) of the millimetre synchrotron continuum 
source, which serves as a bright backlight (13.6 mJy at rest-frame 
230 GHz). The synchrotron continuum is emitted by jets launched from 
the accreting supermassive (~3 x 10° Mg; ref. 13) black hole in the gal- 
axy’s active nucleus (Fig. 4). The absorbers must therefore be located 
somewhere between the observer and the galaxy centre, falling deeper 
into the galaxy at about +300 km s7! towards the black hole at its core. 
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Figure 1 | A multiwavelength view of the Abell 2597 BCG. a, Chandra 
X-ray, HST and DSS optical, and Magellan Ha+ [N 11] emission is shown 
in blue, yellow and red, respectively. Arrows point to the thermally 
unstable hot atmosphere and buoyant bubbles that permeate it. (Image 
credits: X-ray, NASA/CXC/Michigan State University/G. Voit et al.; 
optical, NASA/STScI and DSS; Ha, Carnegie Observatory/Magellan/W. 
Baade Telescope/University of Maryland/M. McDonald). b, HST image 
of Ly emission associated with the ionized gas nebula’, with 8.4GHz 
radio contours overlaid in black. The filamentary ionized nebula consists 
of cooler gas that has precipitated from the hot X-ray bright halo shown 
in a. c, Unsharp mask of the HST far-ultraviolet (FUV) continuum image 
of the central regions of the nebula’®. The FUV emission directly traces 
the locations of young stars in the nebula. Very Large Array (VLA) radio 
contours of the 8.4 GHz source are overlaid in red. 


This radial speed is roughly equal to the expected circular velocity” in 
the nucleus, consistent either with a nearly radial orbit, or with highly 
non-circular motions in close proximity to the galaxy’s core. 
Gaussian fits to the spectral absorption features reveal narrow line- 
widths of ¢,<<6kms~', which means the absorbers are more probably 
spatially compact, with sizes that span tens (rather than hundreds or 
thousands) of parsecs. The shapes of the absorption lines remain 
roughly the same regardless of how finely the spectra are binned, sug- 
gesting that the absorbers are probably coherent structures, rather than 
a superposition of many smaller absorbers unresolved in velocity space. 
If each absorption feature corresponds to one coherent cloud, and if 
those clouds roughly obey size—linewidth relations”>”° for giant molec- 
ular clouds in the Milky Way, they should have diameters not larger 
than ~40 pc. If in virial equilibrium, molecular clouds this size would 
have masses of the order of 10°-10° Mo, and if in rough pressure equi- 
librium with their ambient multiphase (10°-107 K) environment”, they 
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Figure 2 | ALMA observation of continuum-subtracted CO(2-1) 
emission in the Abell 2597 BCG. Emission is integrated from 
—600kms~! to +600km s7! relative to the galaxy’s systemic velocity. 
Channels are binned to 40kms_!. Only >3¢ emission is shown. 8.4 GHz 
VLA radio contours are overlaid in black, and Ha contours outlining the 
rough boundary of the ionized nebula are shown in grey. The nebula is 
slightly larger than the grey contours suggest: emission outside this 
boundary is still part of a smooth, fainter distribution of cold gas, 
co-spatial with similarly faint emission in the optical. 


must have high column densities of the order of Ny, + 10?7—1074 cm? 


so as to maintain pressure support. The thermal pressure in the core of 
Abell 2597 BCG is nearly 3,000 times” greater than that for the Milky 
Way, however, which means the absorbing clouds may be much smaller. 

The absorbers have optical depths in the range 0.1<S 7¢0(2-1)S0.3. 
The physical resolution of the ALMA data is larger than the synchro- 
tron background source, which means that the optical depth is probably 
contaminated by an unresolved, additive superposition of both emis- 
sion and absorption within the beam. Compact, dense cold clouds are 
nevertheless likely to be optically thick, which may mean they eclipse 
the continuum source with an optical depth of unity but a small 
covering factor of roughly 0.2. Especially when considering beam 
contamination by emission, the covering factor cannot be known with 
certainty, as this depends on the unknown geometry of the absorbing 
and emitting regions within the ALMA beam. 

This geometry can be constrained, however, given existing Very 
Long Baseline Array (VLBA) radio observations at extremely high 
spatial resolution!”. These data resolve the 1.3GHz and 5 GHz radio 
continuum source down to scales of 25 pc, revealing a highly sym- 
metric, 100-pc-scale jet about a bright radio core (Fig. 4c). Just as we 
have found in cold molecular gas, inflowing warmer atomic hydrogen 
gas (H 1) has previously been found in absorption against this parsec- 
scale jet, corroborating prior reports of inflowing atomic gas at lower 
spatial resolutions’. The inflow velocity of this gas matches that seen 
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Figure 3 | ‘Shadows’ cast by molecular clouds moving towards the 
supermassive black hole. a, Continuum-subtracted ALMA CO(2-1) 
spectrum extracted from a central 10 kpc region. Brackets mark CO(2-1) 
emission shown in b, where 8.4 GHz radio contours are overlaid. The 
central radio contours have been removed to aid viewing of the continuum 
absorption, seen as the blue/black spot of ‘negative’ emission (which is 

the radio and mm core, the centre of the galaxy and the location of the 
black hole). c, Continuum-subtracted CO(2-1) spectrum extracted from 
this region co-spatial with the mm and radio core. Absorption lines are 
indicated in red. 


in our ALMA data. Remarkably, both the optical depth and linew- 
idth of the warm atomic absorption signal varies dramatically across 
the jet, with a broad (0, 310kms~!) component co-spatial with the 
core that is absent just ~20 pc to the northeast, where only a narrow 
(o,50kms~') H1 line is found at the same redshift. This effectively 
requires the inflowing atomic gas to be confined within the innermost 
~100 pc of the black hole, as gas further out would give rise to an 
unchanging absorption signal across the compact jet. The infall velocity 
is the same as that for the cold molecular clouds seen in CO(2-1) 
absorption, which means they most probably stem from the same 
spatial region, within tens of parsecs of the accreting black hole. 

This is further supported by the ALMA data itself. In emission, all 
gas around approximately +300 kms“! that is conceivably available to 
attenuate the continuum signal is confined to the innermost 2 kpc about 
the nucleus (Fig. 4a, b). The radial dependence of molecular cloud 
volume number density within this region is uncertain, but probably 
steeper than r~', and likely to be closer to r~? (Fig. 4b). This means that 
the chances of a random line of sight crossing will drop with increasing 
distance from the black hole. If the gas volume density goes as r~*, a 
cloud 100 pc from the black hole is ten times more likely to cross our 
line of sight than a cloud at a galactocentric distance of 1 kpc. It would 
be exceedingly unlikely for three such clouds to cross our line of sight 
to the black hole were they spread over several kiloparsecs throughout 
the galaxy’s outskirts. 

The data therefore serve as strong observational evidence for an 
inward-moving, clumpy distribution of molecular clouds within a few 
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Figure 4 | Corroborating evidence that the inflowing molecular clouds 
must be in close proximity to the black hole. a, CO(2-1) absorption 
spectrum from Fig. 3, with a region of emission at about +300kms_! 
marked in yellow. b, Integrated CO(2-1) emission (colour coded) from 
this region, showing that gas at about +300 kms * is confined to the 
innermost 2 kpc of the galaxy. c, 1.3 GHz radio continuum source from an 
archival VLBA observation’” with an extremely high physical resolution 
of ~25 pc by ~10 pc. d, e, Plots of 1.3 GHz radio continuum emission 
revealing H 1 21 cm absorption observed against this synchrotron jet. 

The signal varies dramatically over scales of tens of parsecs. 


hundred parsecs of an accreting supermassive black hole. The result 
augments a small but growing set of known molecular absorption 
systems’”-”? whose black hole proximity is less well constrained. The 
infalling clouds in Abell 2597 BCG are probably a few to tens of parsecs 
across and therefore massive (perhaps 10°-10° Mz each). If they are 
falling directly towards the black hole, rather than bound in a non- 
circular orbit that tightly winds around it, they could supply an upper- 
limit accretion rate of the order of ~0.1 to a few solar masses per 
year, depending on the three-dimensional distribution of infalling 
clouds. If most of the clouds are instead locked in non-circular orbits 
around the black hole, the fuelling rate would depend on the gas angu- 
lar momentum, and the local supply of torques that might lessen it. 
Simulations suggest”!®'* that such torques may be plentiful, as they 
predict a stochastic ‘rain’ of thermal instabilities that condense from 
all directions around the black hole, promoting angular momentum 
cancellation via tidal stress and cloud-cloud collisions. Even highly 
elliptical cloud orbits should therefore be associated with significant 
inward radial motions. The clouds might fall onto the accretion disk 
itself, or into a clumpy rotating ring akin to the ‘torus’ invoked in AGN 
unification models*”. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Observations, data reduction, and analysis. The new ALMA data presented in this 
paper were obtained in Cycle 1 with the use of 29 operational antennae in the 12m 
Array. ALMA%s Band 6 heterodyne receivers were tuned to a frequency of 213 GHz, 
sensitive to the J=2-1 rotational line transition of carbon monoxide at the redshift 
of the Abell 2597 BCG (z=0.0821). The ALMA correlator, set to Frequency Division 
Mode (FDM), delivered a bandwidth of 1,875 MHz (per baseband) with a 0.488 MHz 
channel spacing, for a maximum spectral resolution of about 2km s~!. One baseband 
was centred on the CO(2-1) emission line, while the other three sampled the local 
continuum. Maximum antenna baselines extended to ~1 km, delivering an angular 
resolution at 213 GHz of ~0.7” within a ~28” primary beam (field of view). ALMA 
observed the Abell 2597 BCG, located at RA 23h 25 min 20s, dec. —12° 07/ 38” 
(J2000), for a total of ~3h over three separate scheduling blocks executed between 
17 and 19 November 2013. The planet Neptune and quasars J2258—2758 and 
J2331—1556 were used for amplitude, flux, and phase calibration. The data were 
reduced using CASA version 4.2 with calibration and imaging scripts kindly provided 
by the ALMA Regional Centers (ARCs) in both Garching, Germany and Manchester, 
UK. Beyond the standard application of the phase calibrator solution, we iteratively 
performed self-calibration of the data using the galaxy’s own continuum, yielding a 
~14% decrease in RMS noise to a final value of 0.16 mJy per 0.715” x 0.533” beam 
per 40kms~! channel. There is effectively no difference in CO(2-1) morphology 
between the self-calibrated and non-self-calibrated data cubes. Measurement sets were 
imaged using ‘natural’ visibility weighting and binning to either 5kms~!, 10kms"}, 
or 40kms‘', as indicated in the figure legends. The figures presented in this Letter 
show only continuum-subtracted, pure CO(2-1) line emission. The rest-frame 
230 GHz continuum observation is dominated by a bright (13.6 mJy) point source 
associated with the AGN (detected at 24000), serving as the bright ‘backlight against 
which the continuum absorption features presented in this Letter were observed. The 
continuum data also features compact (~5 kpc) extended emission at ~10o that 
extends along the galaxy’s dust lane, to be discussed in a forthcoming paper. 
Adoption of a systemic velocity. Interpretation of gas motions relative to the stellar 
component of a galaxy requires adoption of a systemic (stellar) velocity to be used 
as a ‘zero point’ marking the transition from blue- to redshift. All CO(2-1) line 
velocities discussed in this Letter are set relative to 213.04685 GHz, where observed 
CO(2-1) emission peaks. This frequency corresponds to *CO(2-1) (rest-frame 
230.538001 GHz) at a redshift of z= 0.0821. This redshift is consistent, conservatively 
within +60kms~|, with every other available multiwavelength tracer of the galaxy’s 
systemic velocity, including prominent Ca 11 H, K, and G-band absorption features!” 
that directly trace the galaxy stellar component, the redshift of all optical emission 
lines*!, as well as a broad (FWHM 412km s~') Hr absorption component" at the 
optical emission and absorption line redshift. It is also consistent, within ~60 km s~ 1 
with a cross-correlation of emission and absorption lines using galaxy template 
spectral’, as well as with all other published reports of the galaxy’s systemic velocity 
(found, for example, within the HyperLeda database). We are therefore certain that the 
reported redshift of the absorption features discussed in this Letter indeed corresponds 
to real motion relative to the galaxy’s stellar component. Without caveat or ambiguity, 
the absorbing cold clouds are moving into the galaxy at roughly ~300 + 60kms_1. 
Mass estimates. All molecular gas masses estimated in this letter adopt the 
following relation*: 


1.05 x 104 
Mmol = 


Xco 1 ScoAv Dy a 
em? |\1+z){Jy kms“! }| Mpc 
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where ScoAv is the emission integral (effectively the total CO flux over the region 
of interest), z is the galaxy redshift (z= 0.0821), and D, its luminosity distance 
(373.3 Mpc), for which we assume a flat ACDM model wherein Hyp =70kms7! 
Mpc}, Qy=0.3, and 2, =0.7. This mass estimate most critically relies on an 
assumption of the CO-to-H conversion factor*”, Xco. In this Letter we assume the 
average Milky Way value of Xco =2 x 107°cm~? (K km s~!)~! and a CO(2-1) to 
CO(1-0) flux density ratio of 3.2. Other authors have provided extensive discussion 
of these assumptions as they pertain to cool core BCGs””394, Scientific conclusions 
in this paper are largely insensitive to choice of Xco. 

A single Gaussian fit to the CO(2-1) spectrum extracted from an aperture con- 
taining all detected emission yields an emission integral of ScoAv = 4.2£0.4 Jy 
km s~! witha line FWHM of 252+ 14kms“!, corresponding to a total molecular 
hydrogen (H2) gas mass of My, = (1.80+0.19) x 10? Mo. This is very close to 
the previously reported'> mass, based on an IRAM 30m CO(2-1) observation, of 
(1.8 + 0.3) x 10° Mo. This comparison is not one-to-one, as the mass from the 
IRAM 30m observation was computed from within a beam size of 11” (rather than 
28" for the ALMA data), and used a CO(2-1)/CO(1-0) flux ratio of 4 (rather than 
3.2, as we use here). These differences are minor, particularly because nearly all of 
the CO(2-1) emission detected by ALMA is found within the central 11” size of 


the IRAM 30m beam. It is therefore safe to say that our ALMA observation has 
detected nearly all emission that was detected in the single-dish IRAM 30m obser- 
vation, and that very little extended emission has been ‘resolved out’ by ALMA. 
Estimating physical properties of the redshifted absorbing molecular gas. We 
have estimated a rough upper-limit size of the absorbing clouds assuming the 
widely adopted Larson et al.”° and Solomon et al.”° size—linewidth relation for 
molecular clouds in the Milky Way (namely, the ref. 26 fit); 


oy = (1.04 0.1)S°5*°% km s~1 


where 0, is the velocity linewidth of the cloud and S is the diameter of the cloud in 
parsecs. A measured absorber linewidth of 7, 6kms~! would then correspond 
to a size of ~36 pc. As noted in the main text of the Letter, the thermal pressure in 
the Abell 2597 BCG is about 3,000 times higher than that for the Milky Way)’, so 
it is likely that the above relation does not apply. A higher ambient pressure implies 
higher compression and therefore smaller cloud size, so the above estimate should, 
at best, be considered a very rough upper-limit. The main lesson to take away from 
this exercise is that the absorbing clouds are probably physically compact (that is, 
a few to tens—rather than hundreds—of parsecs in diameter). 

The three clouds are separated from one another by ~45-60 kms! in velocity 
space, which means they are unlikely to be closely bound satellites of one another. 
Instead, it is more likely that they represent three random points along a radial 
distribution of clouds. 

If the absorbers are in virial equilibrium, their masses Maoua can be roughly 
estimated by applying the virial relation, 

Raoud? 20pe x (6 kms~!)? 
Maoud a a 3 = = 
G 4.302 x10-* pe Mg (kms ') 


; ~1.7x10° Mo 


where Raoud is the cloud radius (as roughly estimated above) and 0, is its velocity 
dispersion (also as above). 
CO(2-1) optical depths for the absorbers were estimated by assuming that: 


= ‘at = 
Ttotal = Lcontinuum® *C°?-)) 


where Itotai aNd I-ontinuum are the integrated intensities of the total (line plus contin- 
uum) and continuum-only signals, respectively, and Tco(2-1) is the optical depth 
of the CO(2-1) absorption feature. 

The stellar velocity dispersion of the BCG*® is 7, = 220 + 19kms™!. Under the 
assumption of an isothermal sphere, the circular velocity should be ~300 km s~! 
(that is, 2 oy), which is roughly the line of sight velocity of the absorption features. 
That the absorbers’ redshift is a significant fraction of the expected circular veloc- 
ity means they could be on a nearly radial orbit (though their transverse velocity 
cannot be known with this single observation). 

While not discussed in the main text, there is an additional simple argument 
that independently suggests that the inward moving molecular clouds must be in 
close proximity to the black hole. If our line of sight is representative, and therefore 
a ‘pencil bean’ sample of a three-dimensional spherical distribution of clouds, 
the total mass of cold gas contained within this distribution should go roughly as: 


2 
a ao rT Nu 
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where f- is the covering factor and r is the radius of an imaginary thin spherical shell 
of molecular gas with column density Ny. If such a shell had a covering factor of 1, 
a radius of 1 kpc, and a column density of 10**cm~’, then the total mass of molecu- 
lar hydrogen contained within that shell would be roughly one billion solar masses. 
A column density in excess of 1072cm~? requires this distribution to be contained 
within a sphere of radius <<1 kpc, lest the limit set by the total mass of molecular 
hydrogen in the galaxy be violated. If the characteristic column density is 107 cm~*, 
for example, this mass must be contained within a sphere of radius 300 pc, or else its 
total mass would exceed the 1.8 x 10° Mo of cold gas present in the galaxy. 

Codes, software, and data availability. Codes that we have written to both reduce 
and analyse the data presented in this Letter have been made publicly availa- 
ble at https://github.com/granttremblay/Tremblay_Nature_ALMA_Abell2597. 
Reduction of the data as well as some simple modelling (for example, fitting of 
Gaussians to lines) was performed using routines included in CASA version 4.2, 
available at https://casa.nrao.edu/. Plots were made using both Python’s 
MatPlotLib and Veusz, which is available at http://home.gna.org/veusz/. 
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superconducting circuit 
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Quantum mechanics can help to solve complex problems in physics! 
and chemistry’, provided they can be programmed in a physical 
device. In adiabatic quantum computing*~°, a system is slowly 
evolved from the ground state of a simple initial Hamiltonian to 
a final Hamiltonian that encodes a computational problem. The 
appeal of this approach lies in the combination of simplicity and 
generality; in principle, any problem can be encoded. In practice, 
applications are restricted by limited connectivity, available 
interactions and noise. A complementary approach is digital 
quantum computing’, which enables the construction of arbitrary 
interactions and is compatible with error correction”*®, but uses 
quantum circuit algorithms that are problem-specific. Here we 
combine the advantages of both approaches by implementing 
digitized adiabatic quantum computing in a superconducting 
system. We tomographically probe the system during the digitized 
evolution and explore the scaling of errors with system size. We 
then let the full system find the solution to random instances of the 
one-dimensional Ising problem as well as problem Hamiltonians 
that involve more complex interactions. This digital quantum 
simulation® |” of the adiabatic algorithm consists of up to nine 
qubits and up to 1,000 quantum logic gates. The demonstration of 
digitized adiabatic quantum computing in the solid state opens a 
path to synthesizing long-range correlations and solving complex 
computational problems. When combined with fault-tolerance, our 
approach becomes a general-purpose algorithm that is scalable. 

A key challenge in adiabatic quantum computing is to construct 
a device that is capable of encoding problem Hamiltonians that are 
classically intractable, that is, non-stoquastic!’, Such Hamiltonians 
would enable universal adiabatic quantum computing'*® and 
improve the performance for difficult instances of classical opti- 
mization problems’®. Additionally, simulating interacting fermions 
for applications in physics and chemistry requires non-stoquastic 
Hamiltonians!!”. In general, these Hamiltonians are more difficult 
to study classically, because Monte Carlo simulations fail to con- 
verge owing to the ‘sign problem’. A hallmark of non-stoquastic 
Hamiltonians is the need for several distinct types of coupling, for 
example, 0,0, and 0,0, couplings with different signs, where ox 
and o, are Pauli operators. With a digitized approach, different 
couplings can be constructed without change of hardware. Long- 
range multibody interactions can be assembled to aid in quan- 
tum tunnelling!’ or to encode the non-local terms for fermionic 
simulations”*”!, And finally, analogue systems exhibit noise, which 
can thwart the evolution, whereas digital systems can be fully 
fault-tolerant. Crucially, this ability makes the digitized approach 
scalable, because any non-corrected implementation is ultimately 
limited by the accumulation of error. Our experiment addresses the 


challenge of adiabatically evolving to final problem Hamiltonians 
that are non-stoquastic. 

We explore the adiabatic quantum evolutions of one-dimensional 
spin chains with nearest-neighbour coupling. We start with a simple 
ferromagnetic problem to visualize the adiabatic evolution process. We 
identify specific error contributions, and follow up by exploring the 
scaling of errors with system size. We finish by testing the device on 
random stoquastic and non-stoquastic problems. The initial (‘T) and 
problem (‘P’) Hamiltonians are 


H,=— B10; 
i 
Hp=— 57(Bio! + Broz) — (ae; oi oftly phitl aio o,**) 
i i 


where B' and Bi, denote local field strengths of the ith qubit, Jt" a 

J oc ‘denote the a,c, and 0,0, coupling strengths, respectively, oe 
qubits i and i + 1 and B,; denotes the initial field strength, which is 
equal for all qubits. The Ising model is recovered when B,=J,.=0 for 
all i. We initialize the system with Hj and vary the system Hamiltonian 
to the final problem: H = sHp + (1 — s)H}, with s going from 0 to 1. An 
example problem is shown in Fig. la. 


W J = +1 (ferromagnetic) 


rs 
i WD J = -1 (antiferromagnetic) 


ra 


Figure 1 | Spin-chain problem and device. a, We implement one- 
dimensional spin problems with variable local fields and couplings 
between adjacent spins. An example of a stoquastic problem Hamiltonian 
with local x and z fields, indicated by the gold arrows in the spheres, 

and 0,0, couplings, whose strength is indicated by the radius of the 

links, is shown. Red denotes a ferromagnetic (J = +1) and blue an 
antiferromagnetic (J = —1) link. The problem Hamiltonian is for the 
instance shown in Fig. 4c. b, Optical picture of the superconducting 
quantum device with nine Xmon” qubits Qo-Qs (false-coloured cross- 
shaped structures), made from aluminium (light) on a sapphire substrate 
(dark). Connections to read-out resonators are at the top; control wiring is 
at the bottom. Scale bar, 200 xm. 
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Figure 2 | Quantum state tomography of the digital evolution into a 
Greenberger-Horne-Zeilinger state. A four-qubit system is adiabatically 
evolved from an initial Hamiltonian in which all spins are aligned along 
the x axis to a problem Hamiltonian with equal ferromagnetic couplings 
between adjacent qubits (J, = 2). a, Real part of the experimental 
density matrix p at the start (left-most panel) and after each Trotter 

step, showing the growth of the major elements on the four corners, 
measured using quantum state tomography. The target state is shown 
with a black outline in the right-most panel. The final state has a fidelity 
of 0.55. Coloured squares surrounding the left-most panel indicate qubit 
indices: for example, Qo being excited is indicated by a red square. Black 
arrows indicate notable elements for states that differ from the target 
state by a single kink. b, As in a, but for the ideal digitized evolution, 


The spin system is formed by a superconducting circuit with nine 
qubits. The qubits are the cross-shaped structures”’, patterned out of an 
aluminium layer on top of a sapphire substrate, and arranged in a linear 
chain; see Fig. 1b. Each qubit is capacitively coupled to its nearest neigh- 
bours, and can be individually controlled and measured; for details 
see ref. 23. By tuning the frequencies of the qubits we can implement a 
tunable controlled-phase entangling gate. We use the first-order Trotter 
expansion to digitize™*. The evolution is divided into many steps and 
implemented using gates; see Supplementary Information. 

For quantifying digitized adiabatic evolutions there are four sets of 
data:(1) the ideal continuous time evolution, for infinite time, which is 
free of error and provides the perfect solution, and which we refer to as 
the ‘target state’;(2) the ideal continuous time evolution for a finite time T, 
which is sensitive to non-adiabatic errors, and which we call ‘ideal 
continuous evolution;(3) the ‘ideal digital evolution, where the finite 
ideal continuous evolution is digitized, and which therefore includes 
digital error as well as non-adiabatic errors; and (4) the experimental 
results, which include a contribution from gate errors as well. 

We start with a ferromagnetic chain problem with N=4 spins, 
and equal coupling strength J,,=2. The qubits are initialized in the 
|-+)©% state, and we use five steps to evolve the system to the prob- 
lem Hamiltonian, performing quantum state tomography after each 
step. We linearly decrease the B, term to zero, starting at B,=2, and 
simultaneously increase the coupling strength from 0 to 2, ending the 
evolution at a scaled time of |J|T =6. The density matrices are shown in 
Fig. 2a. With each step, the quantum state evolves and matrix elements 


8 
+4435 
showing major elements on the four corners as well as other populations 
and correlations. c, Hamiltonian at different s, showing the vanishing 
transversal field and increasing coupling strength; arrows and links as 
in Fig. la. d, Gate sequence showing initialization and the five Trotter 
steps. e, Pulse sequence, showing the single-qubit microwave gates 
(wave-like pulses) and frequency detuning (rectangular-like) pulses. 
Corresponding interactions and local field terms are highlighted. The 
displayed five-step algorithm is 2.1-j1s long. Colours correspond to the 
physical qubits in Fig. 1b. Implementations of a,c, coupling and local 
x-fields are highlighted. Angles of rotation are denoted by ¢ and 0. See 
Supplementary Information for imaginary parts of the density matrices 
and the ideal continuous evolution. 


$444 Ht 


in the middle vanish while the elements at the four corners grow to 
form the density matrix p of the Greenberger-Horne-Zeilinger (GHZ) 
state—the solution to the ferromagnetic problem—with a fidelity 
tr(Prarget-stateP) =0.55. The density matrix is constrained to be physical”. 
The ideal digital evolution is plotted in Fig. 2b, reaching a fidelity of 
0.85. The Hamiltonian during evolution, construction of the algorithm 
and the pulse sequence are shown in Fig. 2c-e. In each Trotter step, we 
perform a 0,0; operation on each pair to implement the ferromagnetic 
a0, coupling, followed by single-qubit rotations around the x axis to 
simulate the transversal magnetic field. In the pulse sequence, the rec- 
tangular-like frequency detuning pulses indicate where 0,0; interac- 
tion is implemented by bringing qubits near resonance (highlighted 
for s=0.2 in Fig. 2d, e). The wave-like pulses are microwave gates. 
The decrease in B, is reflected by the reduction in amplitudes of the 
corresponding pulses (highlighted for s=0.4 and s= 1.0 in Fig. 2d, e). 
Additional microwave echo pulses decrease coupling to other qubits 
and the environment. We find mean phase errors from neighbouring 
parasitic interactions to be around 0.05 rad, equivalent to an error con- 
tribution below 107? (see Supplementary Information). 

The experiment in Fig. 2 shows that digital synthesis of adiabatic 
evolutions can successfully be implemented in a solid-state quan- 
tum platform. Using five Trotter steps, 15 entangling gates and 144 
single-qubit microwave gates, the system produces a GHZ state with 
a fidelity that indicates genuine entanglement. It shows that complex 
pulse sequences are possible, and that the errors make sense: the fidelity 
of the experimental data with respect to the ideal digital evolution is 
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0.64. The overlap between the ideal digital evolution and ideal contin- 
uous time evolution for finite time is 0.93, and the overlap of this con- 
tinuous evolution with the GHZ state (see Supplementary Information) 
is 0.88. The product of the above three values (0.52) is close to the 
experimental fidelity of 0.55, and shows that the experimental error is 
a combination of non-adiabatic, digitization and gate errors. Adopting 
the entangling gate error of 7.4 x 10-7 and 8 x 10~* as measured in 
ref. 25, we expect an accumulated gate error of 0.23 whereas we find an 
infidelity of 0.36; we attribute the difference to errors in maintaining 
the phases of the four-qubit system for a duration of 2.1 1s. 

An important feature of the errors is the prevalence of populations 
and correlations of the |0001), |0011) and |0111) states and their bit- 
wise inverses; see arrows in Fig. 2a. Their elements are also present 
in the ideal digital results and in the ideal continuous evolutions (see 
Supplementary Information). These are states that deviate by a single 
kink from the target state, having a residual energy of 2|J|, indicating 
the presence of non-adiabatic errors. These kink errors are connected 
to the formation of defects during a phase transition, as described by 
the Kibble-Zurek mechanism’. 

To explore the scaling of errors we vary the system size from two to 
nine qubits and measure the likelihood of kinks and residual energy. 
We keep the ferromagnetic problem Hamiltonian, J,,=2, but vary the 
scaled time such that |J|T goes from 0 to 3. For the two- to six-qubit 
systems we use five Trotter steps and for seven to nine qubits we use 
two steps, to limit the total number of gates. The kink likelihood for the 
four-qubit system is shown in Fig. 3a. Here, the likelihood of one kink is 
given by the sum of the probabilities of all states with one kink. When 
increasing |J|T from 0 to 3 the kink likelihood decreases, and the like- 
lihood of no kinks increases (black line in Fig. 3a). The experimental 
data closely follow the ideal digital evolution (dashed lines in Fig. 3a). 
This picture is repeated for all systems; see Supplementary Information. 

The kink likelihood indicates that the final state has residual energy, 
because a state with a single kink has energy 2|J| above the target state. 
The residual energies for all systems are plotted in Fig. 3b. Initially, the 
residual energy is constant at |J|T ~0, then starts to decrease around 
|J|T 0.5, following the ideal digital (dashed lines in Fig. 3b) and ideal 
continuous time (dotted lines in Fig. 3b) evolutions. For two to six 
qubits, this decrease continues until the traces start to settle around 
|J|T =3. For the seven- to nine-qubit systems, the residual energy starts 
to increase again around |J|T =2, following the ideal digital evolution. 
See Supplementary Information for the pulse sequence for the nine- 
qubit experiment, all kink likelihoods and the differences between the 
residual energies. 

The main result is that Fig. 3 distinctly shows the different contribu- 
tions to error (highlighted): for|J|T < 1, the residual energy is domi- 
nated by non-adiabatic errors because the evolution moves too fast. For 
|J|T > 2, the flattening out of the residual energy for the configurations 
with two to six qubits indicates that gate errors dominate, because the 
predictions from the ideal digital evolutions are substantially lower. For 
the larger qubit configurations with seven to nine qubits, the residual 
energy follows the digital predictions upwards, indicating that digiti- 
zation errors dominate. In addition, the residual energy visibly 
decreases at |J|T'= 1 for all configurations, implying that the digitized 
evolutions are able to approach the target state even for nine qubits. 

We also applied local fields to explore the lifting of degeneracy and 
generation of long-range correlations; see Supplementary Information. 

We next discuss how the digitized approach can solve stoquastic and 
non-stoquastic problems with comparable performance, by testing 
random problems on three, six, seven, eight and nine qubits. Problems 
have local fields and couplings with random strength and sign. We 
independently choose B, and B, from [—2, 2] for each spin and J,, from 
[—2, —0.5] or [0.5, 2] for each link. This creates a random Ising problem 
with frustration. For non-stoquastic problems we also add J, coupling 
for each link, with values from [—2, —0.5] or [0.5, 2], effectively 
doubling the amount of entangling gates. We avoid small couplings to 
reduce the number of gates. For the three-qubit systems we used 


224 | NATURE | VOL 534 | 9 JUNE 2016 


a 1.0 hae ; r 
inks . 
Four qubits 
8 1 kinks q | 
x= 2 kinks re 
oO [ 
xX 
& 
& 
b 5.0 = 
Ss 
QO pe ee 
B 1.0 
o 
= 
—_ ESS aa 
rj 
= 
3 Non-adiabatic 
2 error 
© 0.1 Gate error 
fe rill 1 ee ‘ 


0.03 0.1 1.0 3.0 
ly|T 

Figure 3 | Kink errors, residual energy and scaling with system size. 
a, Kink likelihood for the four-qubit configuration. Solid lines, 
experiment; dashed lines, ideal digital evolution; dotted lines, ideal 
continuous time evolution. b, Residual energy in the adiabatic evolutions 
of ferromagnetic chains (Jz, = 2) in configurations with two to nine qubits 
(as indicated by the coloured-coded numerals). The green solid line 
shows the ideal square-root trend for the large-scale limit (Supplementary 
Information). Distinct contributions to error are highlighted. 


quantum state tomography on 100 separate instances to include off- 
diagonal elements in the fidelity metrics. For six or more qubits tomog- 
raphy is not practical and so we measured the correlated probabilities 
on 250 separate instances, and use a measure of success that is equal to 
|(aeai|W)|? (with W the wavefunction) to first order and sets an upper 
bound on the fidelity: (37, ./PkideaPs )*» in which Py ideal and P; are 
probabilities and k runs over the computational basis. In Fig. 4 we show 
the results for stoquastic problems with three, six and nine spins, and 
non-stoquastic problems with three, six and seven spins. For each case, 
we highlight a single instance and show histograms of the fidelities. 

For the three-spin stoquastic problems, the real part of the density 
matrix of one instance and a histogram of its diagonal elements are 
shown in Fig. 4a. In the tomography plot (left panel of Fig. 4a), we 
overlay the experimental results (colour) with the ideal digital (black) 
and target state (grey) results. For this example, we find fidelities 
tr(Pideal-digital?) = 0.70 and tr(Ptarget-state?) = 0.63. In the top right panel 
of Fig. 4a, we show the histograms for all instances of the fidelities 
tr(Prarget-stateP) in colour. The fidelity of the ideal digital evolution with 
respect to the target state is shown in grey. Stoquastic problems with six 
and nine qubits are displayed in Fig. 4b and c, respectively. The main 
figures show the measured probabilities (colour) sorted by the target 
state results (grey), and the insets display the histograms. Results for 
the non-stoquastic problems are displayed in Fig. 4d-f. 

The key result from Fig. 4 is that the physical system can find solu- 
tions to non-stoquastic problems with a performance similar to that of 
stoquastic problems. The three-qubit examples show major diagonal 
as well as off-diagonal elements close to the expected positions. For 
six and more qubits, the coloured bars in the example instances are 
mostly on the left, indicating that the system has a clear preference for 
returning the probabilities associated with the ideal solutions. 

The physical system produces results that are comparable to the 
expectations, as demonstrated by the histograms showing a substantial 
overlap between experiment and theory. Moreover, the numbers are 
consistent, as we now discuss for the six-qubit stoquastic example. The 
mean success rate between the ideal adiabatic evolution and target state is 
0.59 + 0.01, indicating that the scaled time is large enough to capture 
the evolution dynamics. The mean success rate of the ideal digitized 
evolution with respect to the ideal adiabatic evolution is 0.73 +0.01, 
indicating a proper Trotterization of the evolution. Finally, the value for 
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problems. As stoquastic problems we use frustrated Ising Hamiltonians, 
with random local x and z fields, and random a,0, couplings. a—c, Stoquastic 
results for three, six and nine qubits. a, For three qubits we have done 
tomography. An example instance is provided on the left, where we 

show the real part of the density matrix p. Coloured bars denote the 
experimental data, and black and grey outlined bars show the ideal 
digital evolution and the target state, respectively. The diagonals of the 
experiment (colour) and the target state (grey) are shown in the bottom 
right panel (as indicated by the dashed arrow), sorted by ideal target state 
results. The fidelity results for all 100 instances are summarized in the 
histogram (top right), where ratio denotes the normalized occurrence; 
coloured bars, fidelities of experimental results with respect to the target 


the experimental evolution with respect to the ideal digitized evolution 
is 0.714 + 0.006, indicating that the experiment follows the ideal digital 
evolution reasonably well. The product of these three numbers, 0.31, 
is very close to the mean value between the experimental data and 
the target state, 0.296 + 0.007. This shows that the experimental errors 
arise from comparable contributions of non-adiabatic, digital and gate 
errors. For the six-qubit non-stoquastic case, experimental-to-target 
state values are higher than this product, suggesting that errors par- 
tially cancel. A further reason for the higher success rate could be 
that the presence of 7,0, terms is helpful for difficult problems in 
general'®. This experiment took up to nine qubits and up to 10° gates. See 
Supplementary Information for pulse sequences, gate counts, problem 
parameters and additional metrics. 

To further quantify the performance of the system, we compare 
experimental and random probabilities with the theoretical results. In 
essence, we take a uniform random distribution over the 2% possible 
measurement outputs as a baseline sanity check. We find that, for the 
stoquastic problems, the measures of success of all six- to nine-qubit 
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state; grey bars, fidelities of the ideal digital evolution with respect to the 
target state. b, c, The correlated probabilities for six (b) and nine (c) qubits, 
sorted by target state results. Experimental data are in colour, the target 
state is in grey. The results for all 250 instances are summarized in the 
insets. For the nine-qubit instance (c), the first 100 elements are shown. 

In a-c, the coloured squares surrounding or below the plots indicate qubit 
indices, as in Fig. 2. d-f, As in a-c, but for non-stoquastic problems, which 
have additional random o,,0, couplings. Here we plot the data for three, 
six and seven qubits, for which the average measure of success is above the 
random baseline (not shown; see text). The results show that the system 
can find the ground states of stoquastic and non-stoquastic Hamiltonians 
with similar performance. 


configurations are significantly above this baseline: for six qubits, the 
success measure of the experimental data with respect to the target 
state is 0.296 + 0.007, whereas using uniform random probabilities 
produces a value of 0.168 + 0.005. For the nine-qubit case the num- 
bers are 0.122 + 0.006 for the experimental data and 0.074 + 0.004 
for random. For the non-stoquastic problems the numbers are 
0.380 + 0.009 and 0.335 + 0.008 for the six-qubit configuration, and 
0.311 + 0.009 and 0.277 + 0.008 for the seven-qubit configuration. 
A complete listing for all configurations is provided in Supplementary 
Information. 

This experiment shows that digital synthesis of the adiabatic evo- 
lutions can be used to find signatures of the ground states of random 
stoquastic and non-stoquastic problems. Errors arise from a compara- 
ble contribution of non-adiabatic, digital and gate errors, and success 
rates are significantly above a uniform random baseline. For larger 
qubit systems, the number of Trotter steps needs to be limited to reduce 
the accumulation of gate error, in turn limiting the evolution we can 
simulate. Therefore, the experimental error is larger, arising from a 
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combination of gate, digitization and non-adiabatic error. However, in 
an error-corrected system, the number of gates is in principle uncon- 
strained, digitization can be made arbitrarily accurate and one can 
move more slowly through critical parts of the evolution. Although 
we have used Trotterization”’, the scaling of the digitization becomes 
more appealing with recent methods based on the truncation of Taylor 
series”?. See Supplementary Information for further motivations and 
discussions. 

We believe that the digitized approach to adiabatic quantum evo- 
lutions of complex problems—where local fields, variable coupling 
strengths and types, and multibody interactions can be constructed— 
would become viable on the small scale with lower gate errors, and that 
large-scale applications could be achieved in conjunction with error 
correction. We hope our work accelerates further improvements in 
superconducting quantum systems and motivates research into the 
encoding and measurement of non-stoquastic computational problems. 
In addition, we anticipate that these results encourage work on the effi- 
cient digitization of algorithms for small- and large-scale systems, for 
which reducing the effects of noise by, for example, dynamical decou- 
pling techniques, or reducing the circuit complexity is paramount. 
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Metastable high-entropy dual-phase alloys 
overcome the strength-ductility trade-off 


Zhiming Li!, Konda Gokuldoss Pradeep!, Yun Deng!, Dierk Raabe! & Cemal Cem Tasan!? 


Metals have been mankind’s most essential materials for thousands 
of years; however, their use is affected by ecological and economical 
concerns. Alloys with higher strength and ductility could alleviate 
some of these concerns by reducing weight and improving energy 
efficiency. However, most metallurgical mechanisms for increasing 
strength lead to ductility loss, an effect referred to as the strength- 
ductility trade-off!. Here we present a metastability-engineering 
strategy in which we design nanostructured, bulk high-entropy 
alloys with multiple compositionally equivalent high-entropy 
phases. High-entropy alloys were originally proposed to benefit 
from phase stabilization through entropy maximization* ©. Yet 
here, motivated by recent work that relaxes the strict restrictions 
on high-entropy alloy compositions by demonstrating the weakness 
of this connection’"', the concept is overturned. We decrease phase 
stability to achieve two key benefits: interface hardening due to 
a dual-phase microstructure (resulting from reduced thermal 
stability of the high-temperature phase!”); and transformation- 
induced hardening (resulting from the reduced mechanical stability 
of the room-temperature phase’*). This combines the best of two 
worlds: extensive hardening due to the decreased phase stability 
known from advanced steels!*!° and massive solid-solution 
strengthening of high-entropy alloys*. In our transformation- 
induced plasticity-assisted, dual-phase high-entropy alloy (TRIP- 
DP-HEA), these two contributions lead respectively to enhanced 
trans-grain and inter-grain slip resistance, and hence, increased 
strength. Moreover, the increased strain hardening capacity 
that is enabled by dislocation hardening of the stable phase and 
transformation-induced hardening of the metastable phase 
produces increased ductility. This combined increase in strength 
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and ductility distinguishes the TRIP-DP-HEA alloy from other 
recently developed structural materials’®!”. This metastability- 
engineering strategy should thus usefully guide design in the near- 
infinite compositional space of high-entropy alloys. 

To realize the TRIP-DP-HEA concept, we switch from the equi- 
atomic Fe2gMnz9Niz9Co29Cr9 (atomic per cent, at%)° system to the 
non-equiatomic Fego — ;¥MnxCoj9Crio (at%) system, which exhibits 
partial martensitic transformation of the face-centred cubic (f.c.c.) to 
the hexagonal close-packed (h.c.p.) phase upon cooling from the high- 
temperature single-phase region. This change enables development of 
a dual-phase microstructure in which both phases obtain the maxi- 
mum benefit of the solid-solution strengthening effect and one phase, 
owing to the decreased stacking fault energy'’, undergoes deformation- 
induced displacive transformation. The partial martensitic transforma- 
tion during quenching is the only possible approach that can lead to the 
formation of a DP-HEA with phases of identical chemical composition 
(that is, high-entropy phases). The alloys were synthesized with var- 
ying Mn contents in a vacuum induction furnace using pure metals, 
hot-rolled to 50% thickness at 900°C, homogenized at 1,200°C for 2h 
in an Ar atmosphere, and water-quenched. Further grain refinement 
was achieved by cold-rolling (to 60% thickness) and 3-min annealing 
at 900°C in an Ar atmosphere. The chemical composition of the HEAs 
measured by wet-chemical analysis is given in Extended Data Table 1. 

Microstructure characterization down to 30-nm resolution reveals 
that the Fego — ,Mn,Coj9Crio (at%) system indeed demonstrates the 
targeted change in phase stability (see the X-ray diffraction (XRD) 
and electron backscatter diffraction (EBSD) data in Fig. 1). A single 
f.c.c. phase structure was obtained when the Mn content was 45 at% 
and 40 at% (Fe3sMny5Coj9Crjo and FeagMn4oCo19Crjo, respectively). 


Figure 1 | XRD patterns and EBSD phase maps 

of Fego — .Mn,Coj9Crio (x = 45 at%, 40 at%, 

35 at% and 30 at%) HEAs. @ is the Bragg angle. 

The Mn content plays an important part in phase 
constitution, tuning phase stability for the activation 
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of specific displacing transformation mechanisms, 
for example, enabling TWIP or TRIP effects. We 
note that the 35 at% Mn alloy has only trace amounts 
of the h.c.p. phase, and hence is not referred to as a 
DP-HEA. 


TRIP 
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Figure 2 | Elemental homogeneity among the two phases of Fes9Mn39Coj9Crio (at%) HEA. a, Energy-dispersive spectroscopy maps of the shown 


EBSD-mapped sample region. b, Three-dimensional APT tip reconstructio 


ms of Fe, Mn, Co, Cr atom positions in a typical APT tip from the 


EBSD-mapped phase boundary. The crosses refer to the positions that the APT tips were taken from. 


These two alloys demonstrate a transition in the deformation mech- 
anisms from dislocation-dominated plasticity in the former’ to twin- 
ning-induced plasticity (TWIP) in the latter’’, confirming the targeted 
stability trend realized by tuning the stacking fault energy. A further 
decrease to 35 at% Mn leads to traces of h.c.p. phase (not captured by 
XRD). Finally, a decrease to 30 at% Mn (FespMn39Coj9Crio) success- 
fully produces the desired dual-phase microstructure with ~28% h.c.p. 
phase. This alloy is analysed in more detail in the following. 

The two phases constituting the as-quenched Fesp9Mn39Coj9Crjo 
(at%) alloy are the f.c.c. matrix (of ~45-j1m grain size) and the h.c.p. 
€ phase laminate layers (ranging from several nanometres to 101m in 
thickness). In Fig. 2, energy dispersive spectroscopy and atom probe 
tomography (APT) maps are also provided for the corresponding EBSD 
maps, respectively, to reveal the compositional distribution among 
the two phases. The energy dispersive spectroscopy maps in Fig. 2a 
show that all elements are uniformly distributed, suggesting that both 
phases benefit from the same level of solid solution strengthening. 
APT tips were lifted out from a phase boundary region (using the 
method outlined in ref. 19) shown in the EBSD phase map in Fig. 2b, to 
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rule out the possibility of atomic-scale elemental partitioning between 
the f.c.c. and the h.c.p. phases. The analysis reveals that the investi- 
gated volume has an overall composition of Fess 6Mno7.6Co1;,3Cri23 
(at%), showing values near the nominal bulk composition. No appar- 
ent elemental segregations can be observed in the three-dimensional 
reconstructions (Fig. 2b) or from the statistical binomial frequency 
distribution analyses (Extended Data Fig. 1), confirming the uniform 
distribution of all elements even at phase boundaries. This is different 
from Mn-containing steels, which show substantial chemical gradients 
across phase boundaries”?”!. 

Figure 3a shows the mechanical response of the DP-HEA for the 
coarse-grained (as-homogenized, grain size of ~45 1m) and grain- 
refined (recrystallized, grain size of ~4.5 1m) states. To emphasize the 
substantial improvement in the properties upon grain refinement, the 
curves for two other single-f.c.c.-phase HEAs (Fe37MnysCogCro (ref. 8) 
and Fez9Mnz9Ni29Co29Crr9 (ref. 6) (at%)) are also presented. 

The mechanical response of the TRIP-DP-HEA is striking even 
before grain refinement. It exhibits vastly higher strength and ductil- 
ity compared to the single-phase Fe37Mn4sCogCry (at%) HEA. More 
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Figure 3 | Mechanical behaviour of the TRIP-DP-HEAs compared to 
various single-phase HEAs. Grain sizes are shown in micrometres. 

a, Tensile properties. The tensile-curve data of single-phase 
Fe29Mnz9Niz9C029 Cry (at%) in ref. 6 is also shown here. The inset shows 
the increments of (change in) strength and ductility in refs 16 and 17 and 
this work; ultimate tensile strength and elongation to fracture were used as 
strength and ductility. Inset labels a, b and c represent the heterogeneous 
lamella Ti60, Ti80 and Til00 versus coarse-grained Ti, respectively; 
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d, e and f represent the high-specific-strength steels I, II and III versus 
weight-reduced Fe-Al-Mn-C steel, respectively’’; g and h represent 
the coarse-grained (grain size ~45 1m) and grain refined (grain size 
~4.5|1m) DP-HEAs versus the single-phase Fe37MnysCooCro (at%) 
HEA, respectively. b, Strain-hardening response. The inset shows how 
the stability of the f.c.c. phase was optimized upon grain refinement 
to increase the strain-hardening ability; the data points in the inset are 
means + standard deviation of three tests. 
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Figure 4 | Deformation micro-mechanisms 

in the TRIP-DP-HEA with increasing tensile 
deformation at room temperature. a, EBSD 
phase maps revealing the deformation-induced 
martensitic transformation as a function of 
deformation. ¢),, is the local strain and TD is the 
tensile direction. b, ECCI analyses showing the 
evolution of defect substructures in the f.c.c. and 
h.c.p. phases. g is the diffraction vector, + is the 
f.c.c. phase and ¢ is the h.c.p. phase. c, Schematic 
sketches illustrating the sequence of micro- 
processes in the TRIP-DP-HEA. 
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importantly, it has a mechanical response almost identical to that of 
the (grain-refined) single-phase Fea9Mn29Ni29Co29Crao (at%), the most 
successful HEA so far®*. On grain refinement, TRIP-DP-HEA notably 
outperforms the Fez9MnyNizCo29Crap (at%) HEA. Furthermore, the 
inset in Fig. 3a demonstrates that this approach in TRIP-DP-HEA 
has the potential to lead to superior improvements in strength- 
ductility combinations compared to those obtained in other studies 
that focus on conventional low-entropy systems!®!”, Here, the duc- 
tility improvements observed in metallic glass matrix composites”?4 
are not shown for comparison, since for these materials the absolute 
levels of uniform tensile elongation are very low (for example, <3%)4 
even when improved. 

We note that the TRIP-DP-HEA reported in this work was designed 
mainly with the aim of proof of the proposed principle. In our opin- 
ion, much more substantial improvements can be achieved following 
the principles proposed here, if the microstructures and composi- 
tions are optimized further. Figure 3b reveals that these improve- 
ments correspond to a higher work hardening rate in the DP-HEA 
FespMn39Co j9Crjo (at%) than in the single-phase HEAs. This contrib- 
utes also to an extended uniform deformation process (Extended Data 
Fig. 2a). There is a notable difference between the strain hardening 
responses of the coarse-grained and grain-refined FesgMn39Coj9Crio 
(at%) HEAs, which is linked to the size dependence of the f.c.c. phase 
stability (see the inset in Fig. 3b). 

These large improvements in the mechanical properties of the 
TRIP-DP-HEA arise from the underlying plastic accommodation and 


65% 


hardening processes. By coupling EBSD (Fig. 4a) with electron channel- 
ling contrast imaging (ECCI’*) (Fig. 4b), here we unravel these underly- 
ing processes for the case of the coarse-grained TRIP-DP-HEA (Fig. 4c). 

We first focus on the f.c.c. phase. EBSD phase maps reveal that the 
f.c.c. phase is metastable, as desired. It exhibits deformation-stimulated 
martensitic transformation (f.c.c.— h.c.p.) as a primary deforma- 
tion mechanism (Fig. 4a). The importance of this mechanism in the 
observed hardening response can be assessed by comparing the two 
TRIP-DP-HEAs with different grain sizes: when the stability of the f.c.c. 
phase is optimized such that martensitic transformation is observed 
over an extended deformation regime (as for the grain-refined TRIP- 
DP-HEA,; see inset of Fig. 3b), the overall ductility is increased (com- 
pare the grain-refined and coarse-grained TRIP-DP-HEAs in Fig. 3a). 

The ECCI analysis reveals the evolution of the deformation sub- 
structure in the f.c.c. phase (Fig. 4b). Prior to deformation (local 
strain, €j9-=0), a large number of stacking faults is observed in the 
TRIP-DP-HEA. Stacking faults present in the f.c.c. 1 phase are formed 
by gliding of Shockley partials of 1/6(112) Burgers vector”*’’. These 
features constitute thin plates of h.c.p. structure (that is, several atomic 
monolayers of stacking faults). These thin h.c.p. plates have been 
shown to act as the nuclei of the « martensite phase’, which forms 
through the overlapping of stacking faults. The observed stacking 
faults in the undeformed HEA are initial faults that did not sufficiently 
coalesce to form the thermally induced h.c.p. ¢ phase but are likely to 
act as phase-formation nuclei when subjected to externally applied 
mechanical loads. 
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At early stages of deformation, mechanically induced transformation 
from the f.c.c. phase to the h.c.p. ¢ phase acts as the primary defor- 
mation mechanism (Fig. 4a). Since the stacking faults act as nuclei for 
the formation of the h.c.p. ¢ phase, a large number of stacking faults are 
required in the f.c.c. phase to realize the transformation from f.c.c. to 
h.c.p. phase at this stage. This is well documented by ECCI, together 
with an increase in the dislocation density (see the ¢|,.-= 10% and 
Eloc = 30% states in Fig. 4b). Thus, dislocation plasticity and martensitic 
transformation plasticity are both activated at similar deformation levels. 
The increased phase boundary density due to transformation creates 
additional obstacles of dislocation slip, thereby contributing to the strain 
hardening. With increasing strain, transformation from the f.c.c. to the 
h.c.p. ¢ phase continues to be the dominant deformation mechanism, yet 
dislocation activity in the f.c.c. 1 phase becomes more important. At 65% 
local strain (corresponding to the post-necking state, see Extended Data 
Fig. 2b), only ~16% of the fic.c. phase is retained (Fig. 4a). 

We next focus on the h.c.p. phase. The TRIP-DP-HEA was water- 
quenched after homogenization at 1,200°C, so the starting h.c.p. ¢ 
phase is thermally induced by the martensitic transformation. Neither 
the thermally induced € regions nor the mechanically induced h.c.p. 
regions show notable deformation-induced features at low strain 
levels. Gradually, deformation-induced twinning was observed in the 
h.c.p. phase as an important deformation mechanism (Fig. 4b). Thus, 
as the local strain increases to 30% and then to 45%, an increase in the 
density of both mechanical nano-twins and stacking faults is observed 
in the h.c.p. phase (Fig. 4b). The phenomenon of twinning in the 
deformation-induced h.c.p. martensite has also been observed in other 
types of alloys’®. This mechanism contributes profoundly to strain 
hardening through the dynamic Hall-Petch effect. This means that 
the interface density, through the continuously formed twins, increases 
constantly’. Further increase of the deformation leads to the presence 
of a high density of dislocations (Fig. 4b). Thus, the h.c.p. ¢ phase plays 
an important part in plastic accommodation and hardening at later 
stages of deformation via multiple deformation mechanisms (that is, 
dislocation slip, twinning and the formation of stacking faults). 

These deformation micro-mechanisms (Fig. 4) and the impressive 
mechanical response (Fig. 3) confirm the success of this method of 
simultaneously achieving greatly improved strength (from massive 
solid solution strengthening and the increased interface density) and 
ductility (from dislocation-plasticity and transformation-induced 
hardening). The synergic deformation of the two phases leads to a 
highly beneficial dynamic strain-stress partitioning effect*°; with a 
decreased likelihood of damage nucleation owing to their elastic com- 
pliance. Such damage resistance is absent in most dual-phase alloys 
with high mechanical contrast across their hetero-interfaces’”. 

Our effort to combine the best characteristics of steels and HEAs has 
led to the design ofa new class of transformation-induced plasticity- 
assisted, dual-phase HEA. The originally proposed HEA concept 
has motivated enormous efforts to design new alloys, but few of 
the resulting alloys have shown properties that justify the increased 
alloying content, in contrast to the alloy presented here, which exhib- 
its excellent strength—ductility combinations. We emphasize that this 
alloy design strategy is opposite in approach to that generally used 
in HEAs design: rather than focusing on phase stabilization and 
single-phase formation, we propose that phase metastability, and 
ductile multi-phase configurations should be important future 
research goals in this field. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


During the processing of the HEAs presented in this work, the ingot was first 
cast in a vacuum induction furnace using pure metals (purity higher than 
99.8 wt%) with predetermined compositions. The as-cast ingot with dimensions 
of 10 x 50 x 150mm? was subsequently hot-rolled at 900°C with a rolling reduc- 
tion ratio of 50% (thickness changed from 10mm to 5mm). After hot-rolling, 
the alloy was homogenized at 1,200°C for 2h in an Ar atmosphere followed by 
water-quenching. For the FesyMn3qCoj9Crjo (at%) DP-HEA, further grain refine- 
ment was achieved through cold-rolling with a reduction ratio of 60% and subse- 
quent recrystallization annealing at 900°C in an Ar atmosphere for 3 min followed 
by water-quenching. The bulk chemical compositions of all the studied alloys were 
measured by wet-chemical analysis (Extended Data Table 1). 

The microstructures of the alloys were analysed using multiple techniques. 
EBSD measurements were performed using a Zeiss-Crossbeam XB 1540 focused 
ion beam scanning electron microscope (SEM) with a Hikari camera and the TSL 
OIM data-collection software (http://www.edax.com/Products/EBSD/OIM-Data- 
Collection-EBSD-SEM.aspx). Back-scattered electron imaging and ECCI”? anal- 
yses were carried out using a Zeiss-Merlin instrument. The chemical uniformity 


LETTER 


was investigated using energy-dispersive X-ray spectroscopy at the microscopic 
scale, and APT (LEAP 3000X HR, Cameca Inc.) at the atomic scale. The APT tips 
were produced using a focused ion beam (FEI Helios Nanolab 600i) from regions 
including phase and grain boundaries revealed by a prior EBSD scan. 

Flat specimens for tensile testing, with a thickness of 1 mm, were sectioned from 
the homogenized and water-quenched alloy by electrical discharge machining. 
The gauge length and width of the tensile specimens were 10 mm and 2.5mm, 
respectively. Uniaxial tensile tests were carried out at ambient temperature using 
a Kammrath & Weiss tensile stage at the strain rate of 1 x 10-*s~!. Five samples 
for each material were tensile-tested to confirm reproducibility. The local strain 
evolution during tensile test was determined by digital image correlation using 
the Aramis system (GOM GmbH, http://www.gom.com/metrology-systems/ 
system-overview/aramis.html). 

The deformation mechanisms in the DP-HEAs were investigated by EBSD 
and ECC] at different regions of the fractured tensile sample with different local 
strain levels. All of the sample regions analysed by ECCI were first measured 
by EBSD to obtain the specific orientation information corresponding to each 
region. 
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12,000 
Element Reduced y? ny, p-value a 
lm Fe 0.736 37 0.8799 0.0187 
10,000 ma Mn 1.228 34 0.1698 0.0231 
Wag Co 0.798 24 0.7439 0.0156 
ma Cr 1.138 24 0.2902 0.0187 
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Extended Data Figure 1 | Statistical binomial frequency distribution 
analysis results for the APT tip. The statistical analysis shows that the tip 
has an overall composition of Fe4g ¢Mno7.6Co1;,3Crj23 (at%). The binomial 
curves obtained from the experiments match the curves corresponding 

to a total random distribution. The quality of the fit was quantified using 
several parameters, as listed in the key. ng is the number of degrees of 
freedom for a given ion. The values of the normalized homogenization 
parameter yu for all four elements are close to 0, confirming the random 
distribution of elements in the DP-HEA. 
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Extended Data Figure 2 | Strain distribution within the DP-HEA and 65% were highlighted by percentages in red and the corresponding 
sample upon room-temperature deformation. a, Evolution of local microstructures are shown in Fig. 4. b, Digital image correlation strain 
strain with increasing the global strain (€gjo.), indicating an extended map shows the local strain distribution of the tensile sample following 
uniform deformation process. The red dotted circles in a indicate the local _ fracture. 0 to 11 in b refers to the distance of the sample position from 
strain values corresponding to various positions in the fractured tensile the fracture surface, corresponding to the distance values shown in a. 


sample shown in b; four positions with local strains of 10%, 30%, 45% 
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Extended Data Table 1 | Chemical composition of the studied alloys in atomic per cent according to wet-chemical analysis 
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Strongly correlated perovskite fuel cells 


You Zhou!, Xiaofei Guan!, Hua Zhou?, Koushik Ramadoss!, Suhare Adam!, Huajun Liu’, Sungsik Lee’, Jian Shi!+, 


Masaru Tsuchiya’, Dillon D. Fong? & Shriram Ramanathan!® 


Fuel cells convert chemical energy directly into electrical energy 
with high efficiencies and environmental benefits, as compared 
with traditional heat engines!~+. Yttria-stabilized zirconia is 
perhaps the material with the most potential as an electrolyte 
in solid oxide fuel cells (SOFCs), owing to its stability and near- 
unity ionic transference number’. Although there exist materials 
with superior ionic conductivity, they are often limited by their 
ability to suppress electronic leakage when exposed to the reducing 
environment at the fuel interface. Such electronic leakage reduces 
fuel cell power output and the associated chemo-mechanical 
stresses can also lead to catastrophic fracture of electrolyte 
membranes*®. Here we depart from traditional electrolyte design 
that relies on cation substitution to sustain ionic conduction. 
Instead, we use a perovskite nickelate as an electrolyte with high 
initial ionic and electronic conductivity. Since many such oxides 
are also correlated electron systems, we can suppress the electronic 
conduction through a filling-controlled Mott transition induced 
by spontaneous hydrogen incorporation. Using such a nickelate 
as the electrolyte in free-standing membrane geometry, we 
demonstrate a low-temperature micro-fabricated SOFC with high 
performance. The ionic conductivity of the nickelate perovskite 
is comparable to the best-performing solid electrolytes in the 
same temperature range, with a very low activation energy. The 
results present a design strategy for high-performance materials 
exhibiting emergent properties arising from strong electron 
correlations. 

SmNiO3 (SNO) belongs to a series of rare-earth nickelates (RNiO3 
or RNO) with the perovskite structure (ABO3), which exhibits linked 
corner-shared BOs octahedra (Fig. 1a)’. In perovskite oxides, protons 
can form ionic defects (OH in Kréger-Vink notation) by bonding 
with oxygen’, and diffuse through a Grotthuss mechanism that 
involves the fast rotational diffusion of the protonic defects and the 
rate-limiting proton transfer to the neighbouring oxygen ions*’. The 
transition states of the proton rotation and proton transfer require 
local lattice distortions such as elongation and bending of the B-O 
bond, respectively!*". The schematic of proton incorporation and 
diffusion processes for a cubic perovskite is shown in Fig. 1b with the 
following processes: (i) proton incorporation, (ii) rotational diffusion, 
(iii) transfer to neighbouring oxygen, (iv) bending and (v) elongation 
of the B-O bond. In SNO, proton incorporation and diffusion happen 
in a similar way, albeit with several different characteristics, as will be 
discussed in more detail later (Fig. 1c). 

In the low-temperature fuel cell operation range (300-500 °C), sto- 
ichiometric SNO shows metallic conductivity with an electrical resis- 
tivity of ~1 mQ cm, which is detrimental to electrolyte applications. 
The high electronic conductivity is due to single electron occupancy 
on the fourfold degenerate eg manifold (including spin) on Ni’**, as 
shown in Fig. 1d (in the ionic limit; the covalent limit cases are shown 
in Extended Data Fig. 1a and b), where carriers can migrate without 
overcoming the on-site Coulomb repulsion. When electrons are doped 


into SNO via hydrogenation and the valence of nickel is reduced to 
Ni’+ (with overall reaction Ni3+ + O03 + 5H + Ni? + OH9), how- 
ever, electronic transport through the A manifold will be suppressed 


by the Hubbard intra-orbital electron-electron Coulomb interaction 
U (Fig. le). Such filling-controlled Mott transitions enable the appli- 
cation of hydrogenated SNO as an electrolyte, owing to its wide elec- 
tronic bandgap’, which is close to the Ni intra-orbital Coulomb 
repulsion and large enough to suppress electronic conductivity’’. 
Spontaneous incorporation of protons into SNO upon hydrogen expo- 
sure without any electrical bias at low temperatures can be seen in 
Extended Data Fig. 2. This is unlike typical perovskite proton conduc- 
tors such as yttrium-doped BaCeO3; and BaZrOs, where subvalent 
cations are needed as substitutional acceptors to facilitate the hydrogen- 
incorporation process (Fig. 1b). Therefore the concentration of pro- 
tons in SNO may not be limited by the oxygen vacancy concentration, 
as commonly noted in acceptor-doped electrolytes. The electronic 
transport mechanism in H-SNO is characterized by the Efros— 
Shklovskii variable range hopping mechanism, in which small polar- 
ons form because of strong electron-lattice coupling in the presence 
of a Coulomb gap (Extended Data Fig. 1c-e). 

Figure 1f illustrates how this collective quantum mechanical effect 
enables the electrolyte design. Initially no power output is extracted 
from the SNO-electrolyte fuel cell because of the high electronic con- 
ductivity in pristine SNO. When the hydrogen fuel is introduced at the 
anode (catalytic Pt or Pd), hydrogen molecules dissociate into protons 
and donate electrons to Ni(111) in SNO at the triple phase boundaries. 
The hydrogenation process creates an electrically insulating H-SNO on 
the anode side. Once this insulating layer is formed, as long as hydrogen 
fuel is supplied, protons can continue to diffuse under the chemical 
potential gradient, while the electron transport through H-SNO 
directly to the cathode is strongly suppressed by carrier localization. 
As a result, electrons are forced to pass through the external circuit 
and generate electrical power. 

The time evolution of the open-circuit voltage (OCV) ina 
micro-fabricated SOFC with a free-standing SNO membrane (see 
Extended Data Figs 3 and 4 for the device structure and fabrication) 
as the electrolyte verifies the above mechanism (Extended Data 
Fig. 5a). Initially there is no OCV as the cell is electrically shorted by 
pristine SNO. The OCV increases under continuous hydrogen flow 
after the temperature becomes stabilized, as the H-SNO phase forms 
on the anode side, and reaches a stable output when the stationary 
state is reached. The current-voltage characteristics of the micro- 
fabricated SOFCs (Fig. 2a) exhibit typical activation polarization, 
ohmic loss and concentration polarization behaviour, and the power 
output reaches a maximum value of 225 mW cm? at 500°C, which 
is comparable to the best-performing proton conducting fuel cells 
(ref. 14 and references therein). The highest OCV achieved (1.03 V) 
is close to the Nernst potential (~1.07 V), showing that the ionic 
transference number is close to unity, with the electronic conduction 
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Figure 1 | Solid electrolyte design principle based on the emergent 
phase arising from strong correlations. a, The distorted perovskite 
structure of the SNO crystal. b, c, Proton incorporation and conduction 
mechanisms in a conventional solid-state electrolyte (A, B and M are metal 
cations and O is the oxygen anion) (b) and the proposed new electrolyte (c). 
(i) Proton incorporation. (ii) and (iii), Proton transport by rotational 
diffusion within an octahedron (ii) and transfer to a neighbouring oxygen 
ion facilitated by the hydrogen bond (dashed red line) (iii). (iv) and 

(v), The bending (iv) and the stretching (v) of the metal-oxygen bond 
promote processes (iii) and (ii), respectively. In conventional electrolytes, 
substitutional sub-valent cations, M, are needed to facilitate the hydrogen 
incorporation. In SNO, proton incorporation can happen spontaneously. 
The ligand holes in SNO reduce the effective charge of the oxygen ions 
(only two of them are explicitly shown). d, e, The electronic configuration 
of Ni 3d orbitals for the pristine (d) and the electron-doped (e) SNO 

in the ionic limit. Electronic transport is suppressed by the on-site 
electron-electron correlation U upon electron doping (e). f, A schematic 
of a SNO-electrolyte SOFC and its operation mechanism. Spontaneous 
hydrogen incorporation creates a strongly correlated insulating layer and 
suppresses the electronic current. TPB, triple phase boundary. 


almost completely suppressed. The deviation of OCV from the Nernst 
potential in general could be related to gas leakage and residual 
electronic conductivity. The ionic transference number of H-SNO 
at 500 °C is estimated to be 0.96 using the standard electromotive 
force method. Because there is a small yet finite current inside the 
fuel cells through the electrolyte under the OCV condition, the 
electrode polarization loss (in addition to ohmic loss) may contribute 
to the deviation of the measured OCV from the ideal value. Therefore, 
a method that considers the polarization loss may also be used 
to evaluate the ionic transference number!*!® (Extended Data 
Fig. 5b). Increasing the electrolyte thickness typically enhances 
the measured OCV, possibly owing to the reduced possibilities 
of pinholes in the membrane and a decrease in the relative ratio 
between electrode polarization and electrolyte resistance (see Extended 
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Data Fig. 6a and additional discussions in the Supplementary 
Information). 

H-SNO fuel cells with dense Pd anodes also produce power out- 
put, indicating that protons rather than oxygen ions are the dominant 
mobile ion species in the material (Extended Data Fig. 6b). In addi- 
tion, these fuel cells can also work under pure H and are stable over 
tens of hours (Extended Data Figs 6b and 7). The Nyquist plot of the 
cell under OCV conditions measured at 500°C is shown in Fig. 2b. 
The extrapolated area specific resistance of H-SNO, one of the key 
performance metrics, is remarkably low: 0.045 0 cm? at 500°C, 
which is less than one-third of the general target value (0.15 QO. cm?) 
for the area specific resistance of an oxide electrolyte!’. The Nyquist 
plot can be modelled by an equivalent circuit with an ohmic resistor, 
Rohm and serial elements each consisting of a resistor, R; (i=1, 2, 3), 
and a constant phase element, CPE; (i= 1, 2, 3), as summarized in 
Supplementary Table 1. The three semicircles in the Nyquist plot 
originate from the electric double-layer capacitance at the anode and 
cathode, and the pseudocapacitance related to hydrogen incorporation 
in SNO (see the footnote to Supplementary Table 1). 

Figure 2c shows the ionic conductivity (calculated by the elec- 
tromotive force method) of H-SNO measured from free-standing 
Pt/H-SNO/Pt micro-fabricated SOFCs and H-SNO epitaxial films 
on LaAlO3 (LAO) (001) (indexed in pseudocubic notation), com- 
pared with several other best-performing oxygen-ion-conducting and 
proton-conducting electrolyte materials!”~!. SNO has a high ionic 
conductivity with low activation energy (~0.3 eV, similar to solid 
acid protonic conductors”), making it especially suitable for low- 
temperature SOFC applications*. The difference in the ionic con- 
ductivity measured from epitaxial thin films and membranes could 
be related to contributions from grain boundaries”””’. Grain bound- 
aries may not only decrease proton mobility by scattering and 
trapping, but may also reduce the proton concentration proximal to 
the boundaries by creating space charge layers. Therefore the total 
ionic resistance of polycrystalline samples can be larger than that of 
the epitaxial films. 

Several factors may collectively lead to the high ionic conductivity 
with low activation energy in SNO. First, it has been found that in RNO, 
Ni forms a covalent bond with O in a mixed electron configuration of 
3d’ and 3d°L (where L denotes a ligand hole on O 2p) (ref. 24). The 
covalence reduces the effective charge on oxygen and therefore the 
bonding strength between the oxygen ion and the proton, which lowers 
the proton transfer activation energy (Fig. 1c). Additionally, the proton 
transport barrier in perovskites with a tetravalent B-site (A(m1)-B(1v)) 
is in general much smaller than the ones with a pentavalent B-site 
(A(1)-B(v))*°. It has been suggested that A(111)-B(111) perovskites 
may have even higher ionic conductivity*’. This may be explained by 
the weaker repulsion between B-site ions and protons in A(111)—B(11) 
perovskites, which reduces the energy of the proton in its transition state. 
Finally, as the transition states of the proton rotation and proton transfer 
require local lattice distortions such as elongation and bending of the 
B-O bond, respectively'®"', the relative low energy of the Ni-O bending 
and stretching modes in SNO (~35 meV and 75 meV, respectively”°) 
can also contribute to lowering the proton transport barrier. 

To confirm the electron localization mechanism during fuel cell 
operation and to reveal the underlying reasons for the high ionic 
conductivity, both chemical and structural characterizations of the 
SNO hydrogenation process were performed. Ex situ X-ray absorption 
near-edge spectroscopy (XANES) measurements of the nickel K-edge 
from a pristine and a hydrogenated SNO sample are shown in Fig. 3a, 
as well as that from a reference nickel metal sample used for energy 
calibration. Several features are present in the spectra of SNO and 
H-SNO. The pre-edge feature, A, originates from the dipolar transi- 
tion between Ni 1s and Ni3d-O 2p hybridized 3d°L, and points to the 
covalent nature of the Ni-O bond?’. Features B, D and E are derived 
from the first oxygen coordination shell, while C and C’ originate from 
the second shell of the rare-earth ions”’. 
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Figure 2 | Performance of the emergent-phase electrolyte in fuel cells. 
a, Typical current density—voltage characteristics and the power densities 
of Pt/H-SNO/Pt micro-fabricated SOFCs measured at 500°C with 3% 
humidified 5% H:-95% Ar as fuel and laboratory air as oxidant. The 
electrolyte thickness is 1.5 1m for cell 1, and 1m for cells 2 and 3. b, A 
Nyquist plot measured under OCV conditions at 500°C for a Pt/SNO/ 

Pt cell (solid line shows the fitted curve). Z is the complex impedance 
measured from the fuel cell. The Nyquist plot can be modelled by 

an equivalent circuit (inset) with an ohmic resistor, Ronm, and serial 


A substantial shift of the absorption edge to a lower energy is 
observed upon hydrogenation. The energies of the absorption edge 
and other features are consistent with those of Ni(111) in RNO?®. From 
the inflection point in the first derivative of the absorption (Fig. 3b), 
the chemical shift from SNO to H-SNO is determined to be ~2.0eV. 
A linear relation between the absorption edge and the formal valence 
state has been previously noted with a slope from ~1.5 eV per electron 
to ~2.8 eV per electron**”. The absorption inflection point, however, 
depends on both the Ni valence state and the atomic arrangement 
around Ni. In this study, the valence change is the primary factor lead- 
ing to the absorption edge shift, and the Ni valence state change is close 
to —1, suggesting high proton concentration in H-SNO without intro- 
ducing impurity dopants. The change of Ni valence state verifies that 
hydrogen exists as protons in H-SNO, because it is more favourable 
for Ni to accept an electron from hydrogen rather than from O?~ and 
Sm3* ions when changing from SmNi(111)O3 and H-SmNi(11)O3. The 
angle-resolved XANES spectra show that the proton incorporation 
not only happens at the surface but also through the thickness of the 
films (Extended Data Fig. 8). In addition, the decrease in the white 
line intensity suggests an overall decrease in the hole density on Ni 
after hydrogenation. 


Re(Z) (Q cm?) 


1.2 1.4 1.6 1.8 
1,000/T (K-") 
elements each consisting of a resistor, Rj (i= 1, 2, 3), and a constant 
phase element, CPE; (i= 1, 2, 3). c, The ionic conductivity of H-SNO 
compared to the best-performing oxygen-ion-conducting electrolytes 
(dashed lines) and proton conductors (solid lines). The oxygen-ion- 
conducting electrolytes are: stabilized zirconia (YSZ, (ZrO2)o.9(Y203)o.1) 
(ref. 17), Lao gSto.2Gao.sMgo.203 (LSGM) (ref. 18) and doped ceria (GDC, 
Cep.gGdo.20} 9 ~ 5) (ref. 19). The proton conductors are BaZro gY 0.203 ~ 5 
(BZY, in the form of both sintered pellets and highly textured films) 
(ref. 20) and BaCeo.g_ xZrxY0203-— 5 (BCY, 0<x<0.8) (ref. 21). 


The intensity of the pre-edge feature A, offset with respect to 
feature B by ~14.4 eV, represents the density of the 3d°L state 
in the ground state and decreases upon hydrogenation (inset of 
Fig. 3a), which shows that the doped electrons partially fill the ligand 
holes. Following the analysis in ref. 27, we find the pristine ground 
state to be ~0.5|3d’) +0.5|3d8L), and estimate that the concentra- 
tion of ligand holes decreases by ~50% after the hydrogenation. This 
verifies that ligand holes are present on oxygen ions in both SNO 
and H-SNO, which helps to reduce the proton transfer activation 
energy. 

Figure 3c shows the representative in situ XANES spectra during the 
hydrogenation process. The chemical shift is smaller than those of the 
ex situ experiments owing to the lower operation temperature limited 
by the apparatus. The dynamic change in the absorption edge (inset 
of Fig. 3c) shows that the average valence state reaches equilibrium in 
~30 min at 200°C. 

Synchrotron X-ray diffraction studies (Extended Data Figs 9 and 10) 
suggest that the SNO lattice expands during hydrogenation, which 
may lead to a change in the relative rate of inter- and intra-octahedron 
proton transfer? and the lattice open volume, and therefore modify the 
long-range proton transport properties. 
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Figure 3 | Ex situ and in situ XANES characterizations of the phase 
evolution. a, Ex situ normalized Ni K-edge XANES spectra of SNO, 
H-SNO and the nickel metal reference, with zoomed view of the pre-edge 
feature ‘A (inset). The other features (B, C, C’, D and E) are derived from 
the first oxygen coordination shell and the second shell of the rare-earth 
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ions. b, First derivative of the normalized absorption. c, In situ XANES 
spectra of the SNO hydrogenation process performed at 200°C. The arrow 
indicates the direction of time evolution. The dynamics of the shift in the 
energy of the absorption edge EF; is shown in the inset (where Ei. is the 
absorption edge energy of pristine SNO). 
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METHODS 

Micro-fabricated SOFC. 4-inch-diameter, 525-\1m-thick Si (100) wafers coated 
with 200 nm Si3N, on both sides were used as substrates for micro-fabrication of 
an SOFC. One side of the wafer was patterned with photolithography to define 
silicon nitride areas uncovered by photoresist. Then the uncovered silicon nitride 
was removed by reactive ion etching in CF, and O. Afterwards, the exposed 
Si was etched with a 30 wt% KOH aqueous solution at 86°C for ~5h to leave a 
160 x 160 1m? free-standing Si;N4 membrane. After KOH etching, the 4-inch 
wafers were cut into 1 x 1 cm? chips, with nine windows on each of the chips. SNO 
electrolyte films with thicknesses ranging from 50 nm to 1.5,1m were deposited 
onto the silicon nitride membranes by radio-frequency magnetron sputtering in 
an Ar/O) mixture at a total pressure of 5 mTorr, either from a ceramic SNO target, 
or from two metallic Ni and Sm targets. For films sputtered from metal targets, 
the chips were annealed under 100 bar of pure O; at 500°C for 24h so that SNO 
would form the perovskite phase after annealing. The growth rate of SNO was 
calibrated by X-ray reflectivity, cross-section transmission electron microscopy 
and scanning electron microscopy. The Sm:Ni cation ratio was determined by 
energy-dispersive X-ray spectroscopy. Then, the Pt cathode was deposited by 
magnetron sputtering in pure Ar at a total pressure of 75 mTorr, which yields a 
porous Pt layer to increase the size of the triple phase boundaries on the anode 
and cathode side. The Si3N, layer on the backside was removed by reactive ion 
etching in CF, and O. Finally, a Pt anode layer was deposited into the Si well side 
using sputtering under the same conditions as for the cathode. Dense palladium 
films of thickness 100-200 nm were also used as the anode, which were deposited 
by an electron-beam evaporator. The detailed fabrication process is shown in 
Extended Data Fig. 3. 

Synthesis of SNO epitaxial thin films. The epitaxial SNO thin films were grown 
on LAO (001) by radio-frequency magnetron sputtering in an Ar/O) mixture at 
total pressure of 5 mTorr from two metallic Ni and Sm targets. The samples were 
sealed in a vessel under 100 bar of pure O2 and annealed at 500°C for 24h ina 
tube furnace. 

Electrical and electrochemical characterization. Fuel-cell tests were performed 
in a custom-design fuel cell test station. The morphology of the membranes during 
fuel cell testing was monitored in situ under an optical microscope. Anode current 
was collected with a gold O-ring and a stainless-steel base, and the cathode current 
was collected through a micromanipulator probe with a Pt-plated tungsten tip. 
The electrochemical active area for fuel cell performance was defined as the area 
of the free-standing SNO membrane. For epitaxial thin films, the conductivity 
measurements were done using in-plane geometry with porous Pt electrodes. The 
current-voltage characteristics were measured by starting at an OCV and sweeping 
down to 0V at a rate of 20mVs _! (or 10mVs_!). Electrochemical impedance 
spectroscopy was scanned from 10°Hz to 1 Hz with an amplitude of 20 mV. All 
the electrochemical measurements were performed with a Solartron 1260/1287 
electrochemical test setup. The impedance data were fitted using ZView software. 
For Pt/SNO/Pt fuel cells, either dry or moist 5% H3/95% Ar was flown onto the 
anode side. For Pt/SNO/Pd fuel cells, pure H; bubbled through room-temperature 
water was flown onto the Pd anode. In both cases, stationary air was used as the 
cathode oxidant. The ionic conductivity of epitaxial thin films was measured in 
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dry 5% H2/95% Ar and the conductivity of suspended membranes in Pt/H-SNO/ 
Pt SOFCs was measured with 3% humidified 5% H2/95% Ar as fuel and laboratory 
air as oxidant. 

In situ conductivity dynamics measurements were performed with a Keithley 
2635A and Solartron 1260/1287 in a custom-built chamber by switching between 
dry 5% H,/95% Ar and OQ with a fixed flow rate of 150 standard cubic centimetres 
per minute (sccm). Electronic transport studies below room temperature were 
performed in vacuum ex situ using a Lakeshore probe-station and a Keithley 2635A 
on samples annealed in dry 5% H3/95% Ar for 30 min at 200°C. 

X-ray absorption spectroscopy studies. The X-ray absorption spectroscopy 
data were acquired at the bending magnet beamline, 12-BM-B, at the Advanced 
Photon Source, Argonne National Laboratory. The absorption was measured 
in fluorescence mode with the samples placed in a custom-made cell allowing 
in situ control of the atmosphere and heating of the sample. An infrared heater is 
used to heat the sample up to 200°C. A 13-element Ge detector (Canberra) was 
used to measure the fluorescence yield. Grazing incidence geometry was used to 
minimize the elastic scattering intensity. The incident angle is varied from 0.25° 
to 5°, covering a range below and above the critical angle. The calibration of the 
monochrometer was monitored by simultaneously measuring the absorption of a 
nickel reference foil during each measurement. For ex situ XANES measurements, 
SNO samples annealed in dry 4% H2/96% Ar for 30 min at 200°C or 300°C. The 
data were normalized by fitting the pre-edge to zero and the post-edge to 1 using 
Ifeffit performed by the software Athena (http://cars9.uchicago.edu/ifeffit/Ifeffit). 
Both epitaxial SNO thin films of different thickness on LAO and polycrystalline 
SNO thin films on SiO2/Si were characterized by XANES. 

Synchrotron X-ray diffraction. Synchrotron X-ray diffraction of the SNO samples 
were conducted at an insertion device beamline, 12ID-D at the Advanced Photon 
Source on a six-circle Huber goniometer with an X-ray energy of 20 keV using 
a pixel array area detector (Dectris Pilatus 100 K). The X-ray beam had a flux 
of 10!” photons per second. The q,-scan (L-scan) was obtained by removing the 
background scattering contributions using the two-dimensional images. For ex situ 
X-ray diffraction measurements, SNO samples were grown on LAO substrates and 
annealed in 5% H/95% Ar at 300°C for 2h. For the real-space mapping shown 
in Extended Data Fig. 9d-f, an X-ray footprint of 501m (horizontal in Extended 
Data Fig. 9b) x 500 1m (vertical in Extended Data Fig. 9b) was used to scan across 
the sample, collecting the diffraction pattern from each point. 

SNO stability test. To test the material stability in a pure hydrogen atmosphere, 
we annealed SNO thin films under 1 bar of pure H2 ina tube furnace at 500°C for 
48h and 72h. Pt electrodes were deposited onto SNO thin films as catalyst. The 
H; flow was set to a constant of 300 sccm. X-ray diffraction was performed on the 
annealed samples using a Bruker-D8 Discover diffractometer. 
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Extended Data Figure 1 | The electronic structure of SNO and H-SNO 
in the covalent limit and their electronic transport mechanisms. 

a, The electronic structure of SNO in the covalent limit. Ligand holes are 
present on O 2p orbitals of the pristine SNO, while two electrons occupy 
the Ni eg manifold. The pristine SNO, however, is not strongly correlated 
because carriers transport through the O 2p ligand holes. b, The electronic 
structure of H-SNO. Upon electron doping and thus filling of the ligand 
holes, electrons have to overcome Hubbard intra-orbital correlation U to 
transport, which opens up a large Mott gap, and suppresses the electronic 
conduction in SNO. ¢, The resistivity p of H-SNO compared with pristine 


SNO. The resistivity of H-SNO is more than eight orders of magnitude 
larger than that of pristine SNO at room temperature. d, e, Derivatives 

of resistivity (—dlnp/dInT) as a function of T plotted in log-log scale 

for H-SNO and SNO. The transport mechanism can be determined 

from the slope p of the —dlnp/dlnT versus T curves. H-SNO shows the 
Efros-Shklovskii variable range hopping mechanism (p = 1/2), indicating 
polaron formation in the presence of a Coulomb gap*! (d). Pristine SNO 
shows crossover from activated conduction (p = 1) to Mott variable range 
hopping (p = 1/4) (e). The Coulomb repulsion is less strong in pristine 
SNO. 
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Extended Data Figure 2 | Fuel-induced suppression of electronic 
conduction in SNO. a, Temporal evolution of SNO conductivity when 
switching between different gas environments at various temperatures. 


b-d, Images of SNO and H-SNO on transparent substrate LAO. b, Pristine 


SNO shows dark, shining colour and the Pt bars are bright. c, After 


annealing in 5% H2/95% Ar at 300°C for 1.5h and cooling down to room 
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temperature in the same gas environment, SNO near the Pt electrodes 
becomes electronically insulating and transparent. A clear diffusion profile 
can be seen as the transparent region has a shape similar to the outline 

of the Pt electrodes. d, An optical micrograph of the hydrogenated SNO 
indicates a diffusion profile of protons from the triple phase boundaries. 
The diffusion length Lp is estimated to be ~300 pm. 
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Extended Data Figure 3 | A schematic of the fabrication process of fuel 
cells with free-standing SNO membranes as the electrolyte. a, Patterning 
a etch mask on the back side of the Si3;N4/Si/Si3N4 chip with photoresist 
(PR). b, Removing exposed silicon nitride by reactive ion etching in CF, 
and Op. c, Etching the Si from the back side with a KOH aqueous solution 
to make free-standing Si;N4 membrane. d, Depositing SNO thin films 
onto the Si3N4 membranes by radio-frequency magnetron sputtering and 
post-annealing the sample to form stoichiometric SNO. e, Fabricating the 


b 


Reactive ion etching 


d 


SNO deposition 
and annealing 


Porous 


Si,N, back-etching 


Pd anode deposition 


porous Pt cathodes on the front side of the chip. f, Removing the silicon 
nitride membrane from the back side of the chip to expose SNO, using 
reactive ion etching. g, h, Depositing anodes on the back side of the chip. 
Two types of fuel cell anodes were studied in this work: porous Pt as a 
model system (g) and a dense Pd anode (h). Pd is an industry-standard 
proton conducting membrane that is used in this study to selectively 
permeate protons from the fuel side to the cathode. 
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Extended Data Figure 4 | SNO micro-fabricated SOFCs and fuel cell test 
apparatus. a, An image of a 10mm x 10 mm Si;N,/Si chip with nine 
SNO-electrolyte fuel cells (US dime coin shown for size). b, c, Optical 
micrographs of the free-standing buckled SNO membrane due to 

local compressive strain with top Pt cathode on a Si chip. The buckled 
morphology is due to local compressive strain, engineered intentionally by 
synthesis and is critical for the mechanical stability and performance of the 
SOFC. d, A scanning electron microscope of the top porous Pt electrode. 

e, A schematic of the customized low-temperature micro-fabricated SOFC 
(SOFC) testing station. Both pure Hz and 5%H2/95% Ar were used as fuel 
in the experiments. 
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Extended Data Figure 5 | OCV of H-SNO micro-fabricated SOFCs. 

a, Temporal evolution of the OCV of a Pt/H-SNO/Pt micro-fabricated 
SOFC with a 3% humidified 5% H3/95% Ar as fuel and laboratory air 

as oxidant, as the temperature ramps up. Initially SNO is electronically 
conductive so the OCV is close to zero. During the hydrogenation process, 
the OCV continues to increase after the temperature is stabilized and 
reaches near-ideal OCV, indicating that electronic conduction is almost 
completely suppressed in H-SNO by the Mott transition. The hydrogen 
fuel was always supplied at a constant flow rate both before t=0 and 
during the experiments, and the initial low OCV is not due to the lack of 
fuel. b, The ionic transference number of H-SNO at various temperatures 
of interest to low-temperature SOFCs measured in Pt/SNO/Pt cells. 

Two methods can be used to calculate the ionic transference number. 

In the electromotive force (E.M.F.) method, the fuel cell under the OCV 
condition (infinitely large external resistance load) is modelled with an 
equivalent circuit containing a voltage source with an output voltage of 
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Nernst potential Ey, and two resistors Rion and R,, which correspond to 
the electrolyte’s ionic resistance and electronic resistance, respectively. 
The equivalent circuit is similar to the one shown in the inset, but without 
the Ryolarization element (drawn in red). Voc is the measured OCV. Note 
that there will be a small leakage current ijea, due to the finite electronic 
resistance of the electrolyte, but the electromotive force method assumes 
that the interface processes are infinitely fast and omits the polarization 
loss. In the method developed by Liu et al.’*, since there is a very small 
leakage electronic current flowing through the electrolyte, one needs 

to consider the electrode polarization loss. Therefore, an extra resistive 
element (Rpolarization) needs to be considered in the equivalent circuit as 
shown in the inset (the red-coloured element corresponds to the extra 
term). With reduced polarization and increased electrolyte resistance, 
the ionic transference number calculated by the two methods tends to 
converge (see Supplementary Information for more discussion). 
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Extended Data Figure 6 | H-SNO fuel cell performance. a, The 
dependence of micro-fabricated SOFC performance on the thickness 

of the SNO electrolyte at 500 °C. We fabricated a series of samples with 
various thicknesses of the electrolyte while keeping identical deposition 
conditions for the cathode and anode. By doing so, the electrolyte Ohmic 
resistance is varied while the electrode polarization resistance is kept more 
or less a constant. A clear increase in OCV with increasing thickness can 
be seen, which could be due to the decrease in the electrode polarization 
loss because of the larger electrolyte Ohmic resistance, as discussed in 
Extended Data Fig. 5. The power density does not show much dependence 
on the electrolyte thickness, because thicker electrolytes leads to higher 
Ohmic resistance, but also higher OCV. b, Performance of Pt/SNO/Pd 
micro-fabricated SOFCs with a dense Pd anode with 3% humidified pure 
Has fuel and laboratory air as oxidant. It has been shown that hydrogen 
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primarily creates protonic defects rather than oxygen vacancies in SNO 
(ref. 12). To verify that protons are the dominant mobile ion species in 
SNO and H-SNO, we fabricated an SOFC with the SNO electrolyte, a 
dense 100-nm-thick Pd anode, and a porous 100-nm-thick Pt cathode. 
Pd anode is known as a protonic conductor but an oxygen ion barrier 
and can therefore filter out any oxygen ion transport. This verifies that 
protons rather than oxygen ions are the dominant mobile ions in SNO. 
During the fuel cell testing, 100 sccm pure H; was flowed on the anode 
side, with the cathode exposed to air. The fuel cell with dense Pd has an 
OCYV of 0.6 V and a peak power density of 24mW cm at 500°C. The 
protonic conductivity of H-SNO can be extrapolated from impedance 
spectroscopy and OCV measurements. The similar values of the measured 
ionic conductivity in cells with Pt and Pd anode confirm that protonic 
conduction is the dominant ionic transport mechanism. 
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Extended Data Figure 7 | Stability of H-SNO. a, Cell voltage measured 
at 500°C for a Pt/SNO/Pd fuel cell with wet 100% H) as the fuel and 
stationary air as oxidant with current being 0mA cm? (OCV condition) 
and 78mA cm ~*, respectively. The operation is stable for more than 
20h, implying that H-SNO exhibits considerable stability for fuel cell 
operation. The power output decreases slightly as a function of time 


owing to coarsening-induced porosity reduction of the metallic electrodes 
when current is drawn at 500°C. b, X-ray diffraction pattern of SNO, and 
H-SNO (on LAO substrates) after being annealed under 1 bar of pure H2 
at 500°C for 48h and 72h. No new diffraction peaks are observed after 
annealing, which shows that H-SNO is quite stable in pure H2 for extended 
periods of time. 0 is the incident angle of the X-ray. 
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Extended Data Figure 8 | Angle-dependent XANES characterization. 1jum. The absence of angle-dependence of the XANES spectra shows that 
a, Ex situ angle-dependent XANES spectra of hydrogenated SNO with the hydrogen incorporation happens almost homogeneously across the 
a reference spectrum from pristine SNO. The critical angle 0. of X-ray film thickness. The XANES spectrum acquired at incident angle of 1° (not 
scattering for SNO at the X-ray energy near Ni K-edge is calculated shown) is also similar to those at 0.25° and 5°. b, The first derivative of the 
to be 0.335°. When the X-ray incident angle is below the critical angle normalized absorption shows a similar change in the average valence state 
(0.25°), the XANES signal is surface sensitive with a penetration depth of Ni at the film surface and in the bulk. 


of ~10nm. For an incident angle of 5°, the penetration depth is close to 
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Extended Data Figure 9 | Synchrotron structural characterization of the 
emergent SNO phase. a, An increase in the lattice constant can be caused 
by the larger crystal radius of Ni?* and electron localization. When the 
formal valence state of Ni reduces, its ionic radius Ry; increases, leading 
to the elongation of the Ni-O bond. In addition to the simple valence- 
state-related lattice expansion, electron localization can also increase the 
metal-oxygen bond length, which can be understood on the basis of the 
virial theorem for central-force fields: 2(T) + (V) = 0, where (T) is the 
mean kinetic energy of electrons, and (V) is the average potential energy. 
When transiting from itinerant to localized electronic behaviour, the 
absolute value |(V)| must decrease, which is achieved by a longer metal- 
oxygen bond length*’, that is, (Ni-O) joc exceeds (Ni-O) tin even for the 
same valence state. b, An optical image of a hydrogenated SNO sample. 
H-SNO phase forms near and under the Pt electrodes, while a part of the 
sample remains in its pristine phase. c, X-ray diffraction patterns from 
the various spots A, B, C and D marked in b. The SNO and LAO peaks are 
indexed in pseudocubic notation. As the pristine SNO has a pseudocubic 
lattice constant close to that of the LAO, the SNO (002) appears almost as 
a shoulder of the LAO (002) peak. With decreasing distance between the 
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X-ray spot and Pt electrodes, SNO (002) indeed shifts to smaller q, (no 
other peaks observed). Two peaks (peak 1 at q-=3.18 A! and peak 2 at 
qz= 2.98 A~!) appear in the hydrogenated region and correspond to ~4% 
and ~10% increase in the lattice constant. Peak 2 has the largest intensity 
right underneath the Pt catalyst, while peak 1 has the highest intensity far 
away from the Pt electrodes. The difference in the lattice constant change 
can be related to the decreasing doping concentration with increasing 
diffusion length from the triple phase boundary where hydrogen enters 
SNO (Extended Data Fig. 10c). d-f, Real-space mapping of the intensity of 
the Pt (111) peak at q,=2.78 A~! (d), the H-SNO peak 1 at q,=3.18A7! 
(e) and the H-SNO peak 2 at q, = 2.98 A”! (f). A clear positive correlation 
between the Pt (111) and the g,=2.98 A peaks can be seen, whereas 

the Pt (111) and q.=3.18 A“! peaks show a negative correlation. The 
intensity of both peaks 1 and 2 is low in the pristine region, as expected. 
The increase in the average Ni-O bond length can be also inferred 

from XANES spectra using Natoli’s rule*’, which states that the energy 
separation between features B, D, and E will scale inversely with the square 
of the Ni-O distance, because they are derived from the first oxygen 
coordination cell?’. 
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Extended Data Figure 10 | Raw X-ray diffraction patterns and a 
schematic of proton diffusion. a, b, The collected raw two-dimensional 
diffraction patterns for the real-space mapping in Extended Data Fig. 9. 
To get the real-space mapping of different peaks across the sample, we 
scan the sample with an X-ray footprint of 50 j1m (horizontal in Extended 
Data Fig. 9b) x 500m (vertical in Extended Data Fig. 9b) by collecting 
the diffraction pattern from each point with an area detector. Then we 
calculate the diffraction intensity of each peak (Pt (111), and peak 1, 2 in 
Extended Data Fig. 9c) at each real-space spot from the 2d images and 
map it into real space to create Extended Data Fig. 9d-f. a, Diffraction 
pattern of Pt (111) from a spot on the Pt electrode. A diffraction ring is 
observed as Pt is polycrystalline. b, Diffraction pattern at q-= 3.18 A~! 
from a spot between the Pt electrodes. Unlike the Pt pattern, it shows up as 
a point with a truncation rod rather than a ring in k-space, indicating that 
H-SNO is still epitaxial on LAO after hydrogenation. For both a and b the 


region inside the white dashed line was used to calculate the signal, while 
the region enclosed by the red dashed line but not by the white dashed line 
was used to calculate the background along both the q, and q, directions. 
The signal/background region and calculation algorithm were kept the 
same for all the real-space spots measured on the sample for a particular 
spot in the reciprocal space. c, A schematic of proton incorporation and 
diffusion near Pt electrodes. The part of SNO directly underneath the 
porous Pt electrodes is on average closer to the triple phase boundaries 
(TPBs) than the SNO region between the Pt electrodes. Therefore, a 
higher concentration of protons is expected under the Pt electrodes, which 
explains the larger lattice constant change and the correlation relation 
found in Extended Data Fig. 9. As the thickness of the film (z ~~ 100 nm) 

is much smaller than the diffusion length (hundreds of micrometres), the 
proton concentration should not vary much along the thickness direction 
for the case of epitaxial thin films on LAO. 
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An autonomous chemically fuelled small-molecule 


motor 


Miriam R. Wilson!, Jordi Sola!*+, Armando Carlone!*+, Stephen M. Goldup!'t, Nathalie Lebrasseur!} & David A. Leigh! 


Molecular machines are among the most complex of all functional 
molecules and lie at the heart of nearly every biological process!. A 
number of synthetic small-molecule machines have been developed’, 
including molecular muscles**, synthesizers*°, pumps’~°, walkers!°, 
transporters! and light-driven!?-'° and electrically!” driven rotary 
motors. However, although biological molecular motors are powered 
by chemical gradients or the hydrolysis of adenosine triphosphate 
(ATP)!, so far there are no synthetic small-molecule motors that 
can operate autonomously using chemical energy (that is, the 
components move with net directionality as long as a chemical fuel 
is present) !°. Here we describe a system in which a small molecular 
ring (macrocycle) is continuously transported directionally around 
acyclic molecular track when powered by irreversible reactions of 
a chemical fuel, 9-fluorenylmethoxycarbonyl chloride. Key to the 
design is that the rate of reaction of this fuel with reactive sites on 
the cyclic track is faster when the macrocycle is far from the reactive 
site than when it is near to it. We find that a bulky pyridine-based 
catalyst promotes carbonate-forming reactions that ratchet the 
displacement of the macrocycle away from the reactive sites on 
the track. Under reaction conditions where both attachment and 
cleavage of the 9-fluorenylmethoxycarbonyl groups occur through 
different processes, and the cleavage reaction occurs at a rate 
independent of macrocycle location, net directional rotation of the 
molecular motor continues for as long as unreacted fuel remains. 
We anticipate that autonomous chemically fuelled molecular motors 
will find application as engines in molecular nanotechnology™!”””. 

The design of nanometre-scale motors in which the components 
incessantly rotate with net directionality has tantalized scientists since 
Feynman's discussion of the physics of a theoretical tiny ratchet-and- 
pawl’!. In the 1990s Kelly’s group produced a series of molecular ana- 
logues of a ratchet-and-pawl, confirming the lack of directional bias 
in the movement of the components at equilibrium’. Their designs 
culminated in a system that employed chemical reactions to bias a 
120° rotation of a triptycene residue in one direction”’, but attempts 
to extend this approach to repetitive 360° directional rotation proved 
unsuccessful**. Light-driven rotary molecular motors based on over- 
crowded alkenes!*!? and imines'*!® have been developed by the 
groups of Feringa and Lehn, while our group*>”° and others””-*” have 
made molecules in which the components can be rotated direction- 
ally stepwise by repetitively carrying out several chemical reactions 
in sequence. The latter systems all operate through Brownian ratchet 
mechanisms, differentiating the rates of random thermal motion 
of the components in each direction by the manipulation of kinetic 
(mainly steric) barriers”°. Autonomous operation requires the ratchet 
mechanism to operate continuously, meaning that the barriers must 
be repeatedly raised and lowered under the same set of reaction con- 
ditions and coupled to the consumption of a chemical species in order 


to avoid falling foul of the second law of thermodynamics”. 


The structure and mechanism of operation of a rotary molecular 
motor (1) that continuously rotates its components with net direc- 
tionality when driven by chemical energy is shown in Fig. 1. The mol- 
ecule is a [2]catenane featuring two interlocked molecular rings of 
different sizes. Fumaramide residues (shown in green) on the larger 
ring (the ‘track’) serve as binding sites for a smaller benzylic amide 
macrocycle (blue). Removable bulky groups (red) block the passage 
of the small ring and, when both blocking groups are attached, trap 
it in one or other compartment of the cyclic track. As previously 
demonstrated*!*”, the macrocycle can be directionally transported 
between adjacent compartments of a rotaxane thread using the acy- 
lation of hydroxyl groups as the energy input. We reasoned that the 
issue of repeatedly raising and lowering the kinetic barriers to trans- 
port under a single set of reaction conditions could be achieved by 
using a blocking group that attaches and detaches through dissim- 
ilar reaction mechanisms: one reaction (for example, attachment) 
proceeding at rates that vary according to the position of the small 
macrocycle, the other (for example, cleavage) occurring at a rate inde- 
pendent of the small macrocycle position (an ‘information ratchet’ 
mechanism7?°3137), 

In [2]catenane 2/2’, in which one 9-fluorenylmethoxycarbonyl 
group of 1 has been cleaved, there is a substantial difference between 
the distances of the small-ring binding sites (fumaramide groups) and 
the revealed hydroxyl group; one is very close, where the presence 
of the ring should inhibit nucleophilic attack by the OH group ona 
large electrophile, and one too far away for a bound ring to influence 
rates of reaction noticeably. This should result in dissimilar reaction 
rates for when the macrocycle occupies the fumaramide unit near 
(Ketose-attach) OF far from (kfar-attach) the hydroxyl group. We carried out 
model studies on a number of potential chemical fuels, eventually 
selecting 9-fluorenylmethoxycarbonyl chloride (Fmoc-Cl) because 
its mechanism of attachment to the molecular motor is very different 
to that of cleavage of the resulting fluorenylmethoxycarbonate group 
(shown for the rotaxane model system in Fig. 2). The former occurs 
by nucleophilic attack of a hydroxyl group directly on the C=O of the 
chloroformate residue, where the presence or absence of the bulky 
benzylic amide macrocycle on the adjacent fumaramide group would 
be expected to influence the reaction rate (Kfar-attach = Kelose-attach)- In 
contrast, the detachment reaction occurs by a reaction cascade (elim- 
inating CO, and dibenzofulvene) initiated by base abstraction of a 
proton from the fluorenyl methine group. This is five bonds remote 
from the site of attachment to the [2]catenane and so the influence 
of the position of the macrocycle on the detachment reaction rate 
should be minimal (kfay-cleave & Kclose-cleave)» Lhe reactions that lead to 
the attachment and to the cleavage of the Fmoc group can both be 
promoted under basic conditions. 

Starting from the mono-hydroxyl species 2 and 2’, Fmoc attach- 
ment to 2 should favour formation of the positional isomer FumD-1 
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(carbonate formation preferentially occurring distant to the small 
ring) and likewise Fmoc attachment to 2’ should preferentially form 
FumH,-1, each reaction causing net transport of the benzylic amide 
macrocycle in a clockwise direction. The cleavage of either of the Fmoc 
groups of 1 then occurs (to form 2 or 2’ in equal amounts), allowing 
another Fmoc attachment step to occur, again proceeding with net 
directional movement of the small ring. To maximize the efficiency of 
the process, sufficient Fmoc-Cl needs to be present for the attachment 
reaction to proceed rapidly whenever a hydroxyl group is unmasked. 
This prevents accumulation of the catenane diol, in which both Fmoc 
groups have been cleaved, leaving the small ring free to shuttle around 
the track without directional bias. A fuller discussion of the kinetics* 
of the information ratchet mechanism, including deriving the net 
directionality of ring rotation from the rate equations, is given in 
Supplementary Information, section 6. 

We first developed the chemistry necessary for the operation of 
1 ona simpler [2]rotaxane (a ring threaded on a dumbbell-shaped 
axle) system, 3 (Fig. 2). [2]Rotaxane 3 was prepared from (R)-3- 
amino-1,2-propanediol (see Supplementary Information, sections 
1.2.1 and 1.3.1). When rotaxane 3 was treated with Fmoc-Cl in the 
presence of a bulky carbonate-forming catalyst, (R)-5 (Fig. 2a), the 
macrocycle was predominantly trapped in the FumD 2 compartment 
(up to 17:83 FumH-4:FumD,-4, as shown by 'H nuclear magnetic 
resonance (NMR) spectroscopy in Fig. 2b. Other reaction conditions 
led to poorer discrimination between the compartments). This result 
confirms that catalyst (R)-5, in its acylated intermediate form (Fig. 2a), 
can distinguish between the two positional isomers of the rotaxane 
that interconvert through the macrocycle shuttling between the two 
fumaramide residues, and that the acylated intermediate preferen- 
tially reacts with the hydroxyl group when the macrocycle is on the 
FumD> group, that is, kfar-attach > Ketose-attach- Although a chiral catalyst 
(and chiral motor) was used, mainly for synthetic convenience, the 
positional bias of the Fmoc addition is almost independent of catalyst 
handedness (see Supplementary Information, section 5) and stems 
from one macrocycle binding site being close to the site of reaction 
on the axle, and the other far away. 

With a directional bias established for the Fmoc addition step, we 
next investigated the Fmoc cleavage reaction. A solution of 20:80 
FumH)>-4:FumD,-4, in dichloromethane (CH2Cl,) was treated with 
triethylamine (NEts) (Fig. 2c). The reaction was sampled at various 
times, and before all the rotaxane Fmoc groups had been cleaved 
'H NMR analysis of the recovered rotaxane 4 showed the ratio of 
FumH2:FumD, to be unchanged from the starting ratio (for example, 
rotaxane 4 after 67% formation of 3, Fig. 2d). Thus the Fmoc groups 
are cleaved from FumH 2-4 and FumD>-4 at the same rate; the position 
of the macrocycle in rotaxane 4 does not influence the rate of Fmoc 
cleavage, that is, Kfar-cleave — Kelose-cleaves 

Next, conditions were established under which both the Fmoc 
attachment and cleavage reactions take place in the same reaction 
mixture (see Supplementary Information, section 2). In a typical 
procedure, the rotaxane (3 or 4) and (R)-5 were dissolved in CH2Cl, 
and KHCO; was added (to regenerate NEt3 from hydrochloride salts 
formed by the cleavage reaction). Solutions of the Fmoc-C]l fuel and 
Et3N in CHCl, were mixed together initially and then more Fmoc-Cl 
slowly and continuously added using a syringe pump for as long as 
the motor was required to run. Subjecting rotaxane 4 with an initial 
macrocycle distribution of 100:0 FumHz:FumD> to these operation 
conditions resulted in 4 with a distribution of 17:83 FumH:FumD>, 
at the steady state (Supplementary Figs 3 and 4). That the Fmoc for- 
mation and cleavage reactions run concurrently was further con- 
firmed by showing that a deuterium (D)-labelled Fmoc group on 
the rotaxane could be exchanged for an unlabelled one under these 
operating conditions. Treatment of D2-(33:67 FumH2:FumD2)-4 with 
unlabelled Fmoc-Cl under the operating conditions formed (17:83 
FumH2:FumD2)-4, with a loss of D2-label from 63% to 10% incorpora- 
tion after 18h, as shown by mass spectrometry. Switching the chemical 
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Figure 1 | Operation of a chemically fuelled [2]catenane rotary motor. 
The benzylic amide macrocycle (blue) binds to one or other of the two 
fumaramide sites (green) of the cyclic track. Bulky groups (red) sterically 
block passage of the small blue ring and trap it in one compartment or 
the other (the right- or left-hand side of the track as shown). Cleavage 

of one of the bulky groups through a chemical reaction (loss of orange 
ball) allows the small ring to shuttle back and forth between the two 
fumaramide sites on the track via Brownian motion along the 

unblocked pathway. Attachment of another bulky group (addition of 

red ball) through another chemical reaction (under the same 
conditions) locks in any change of location of the small ring (that is, 

if the ring has changed compartment it is prevented from returning 

to the original one). If the kinetics for blocking group attachment 

are faster when the small ring is far from the reactive site 

(Kfar-attach > Kelose-attach} for example, for steric reasons), but the cleavage 
reaction occurs at a rate independent of the small ring position 

(Kfar-cleave = Kelose-cleave)» then the small ring will directionally rotate around 
the larger one. One of the fumaramide groups is deuterium-labelled to 
distinguish the compartments and allow the location of the small ring 

to be determined by 'H NMR spectroscopy. Compound 1 is the catenane 
with two Fmoc groups attached. Compound 2 is the catenane with one 
Fmoc group attached close to the labelled fumaramide group. Compound 2’ 
is the catenane with one Fmoc group attached close to the unlabelled 
fumaramide group. The italicised prefix (FumH»- or FumD>-) 

refers to the location of the benzylic amide macrocycle in 1, 2 or 2’. 
Thick arrows indicate the major pathway of a reaction, dashed arrows 
indicate the minor pathway and thin arrows indicate pathways that 
occur at similar rates. The blue arrow indicates the direction of net 
transport of the benzylic amide macrocycle when kfar-attach > Kelose-attach 
and Kear-cleave = Kdlose-cleave: 
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Figure 2 | [2]Rotaxane model system to demonstrate directional bias for 
Fmoc addition and position-independent Fmoc cleavage. a, Positional 
bias of the macrocycle in Fmoc attachment to rotaxane 3. Reaction 
conditions: Fmoc-Cl (5 equivalents), (R)-5 (5 equivalents), CH2Clo, 
room temperature, duration 18h. b, Partial 'H NMR spectra (600 MHz, 
CD30D:CDCl; 3:1, 300 K) of 100:0 FumH-4:FumD>,-4 (obtained from 
an unambiguous synthetic route) (top 'H NMR spectrum) and 17:83 
FumH?2-4:FumD>-4 formed using (R)-5 as the carbonate-forming catalyst 
(bottom 'H NMR spectrum). Residual solvent peaks are shown in grey. 
The lettering corresponds to the proton labelling in a. Full spectral 
assignments are given in Supplementary Information, section 1.3. 


fuel being added to the deuterium-labelled version (D2-Fmoc-Cl) then 
fully restored the labelled form of D2-(17:83 FumH>:FumDz)-4 after 
66h (see Supplementary Fig. 4). 

We note that these results indicate that a macrocycle on a polymer 
consisting of repeat units of 4 without the terminal stopper groups 
should inexorably be transported towards one end of the polymer 
chain by treatment with the Fmoc-C]l fuel under these reaction con- 
ditions. In other words, rotaxane 4 is a functioning engine system for 
a chemically fuelled linear molecular motor. 

We applied the same principles to the synthesis and operation of a 
chemically fuelled [2]catenane rotary molecular motor. [2]Catenane 2’ 
was prepared from (R)-3-amino-1,2-propanediol (see Supplementary 
Information, sections 1.2.2 and 1.3.2). The benzylic amide macrocycle 
distribution between the two fumaramide sites in 2’ is approximately 
40:60 FumH>:FumD> (estimated from the 'H NMR shielding of the 
FumH) protons in 2’ compared to FumH,-1 and consistent with the 


“B5 84 83 82 8165 64 59 58 57 56 55 

6 (p.p.m.) 
In the bottom spectrum of b regions 6.3-6.6 parts per million (p.p.m.) and 
5.4-6.0 p.p.m. are scaled vertically 3x compared to region 8.0-8.6 p.p.m. 
c, Lack of macrocycle positional bias for Fmoc cleavage from rotaxane 4. 
Reaction conditions: NEt; (5 equivalents), CHCl, room temperature, 
2h, yield 67%. d, Partial lH NMR spectra (600 MHz, CD30D:CDC], 3:1, 
300 K) of 20:80 FumH -4:FumD,-4 (top 'H NMR spectrum) and FumH)-4: 
FumD,-4 (bottom 'H NMR spectrum) recovered after cleavage of about 
two-thirds of the Fmoc groups. Residual solvent peaks are shown in grey. 
The lettering corresponds to the proton labelling in a. Regions 
6.3-6.6 p.p.m. and 5.4-6.0 p.p.m. are scaled vertically 3x compared to 
region 8.0-8.6 p.p.m. 


results of carbonate formation promoted by pyridine, a small catalyst 
(Supplementary Table 2)). The energy barrier for macrocycle exchange 
between the fumaramide sites in related rotaxanes is ~16 kcal mol! 
in CDCl, (ref. 25), suggesting that macrocycle shuttling in 2’ occurs 
hundreds of times a second under the motor operating conditions. 
The Fmoc attachment-cleavage chemistry of 2 (Supplementary 
Information, section 3) mirrored that of rotaxane 3. When catenane 
2’ was treated with Fmoc-Cl in the presence of catalyst (R)-5, the 
macrocycle was predominantly trapped in the FumH 2 compart- 
ment (80:20 FumH-1:FumD?2-1, Supplementary Fig. 5a and b), that 
is, kfar-attach > Kclose-attach. A model catenane was also prepared (see 
Supplementary Fig. 5c and d), replacing one Fmoc carbonate group 
of 1 with an analogous Fmoc-methiny] ester. The substitution of an 
oxygen atom for a carbon atom ensures that cleavage of this group 
cannot occur under the motor operating conditions. This enabled the 
Fmoc cleavage reaction to be studied in a catenane possessing only one 
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Percentage of D,-labelled 
Composition 
Fmoc incorporated 
Cycles Time (h) 1 2/2’ 1 D,-1 
0 100 0 100 0 
. (i) 6 32 57 - - 
; (i) 24 94 6 39 61 
[ (iii) 29 34 55 - - 
. lL (iv) 32 100 0 68 32 


Figure 3 | Exchange of Fmoc groups during stepwise operation of 
catenane 1. Reaction conditions: (i) NEt; (8 equivalents), (ii) D2-Fmoc-Cl 
(16 equivalents), (R)-5 (16 equivalents), (iii) NEt3 (10 equivalents), (iv) 
Fmoc-Cl (16 equivalents), (R)-5 (16 equivalents). Some diol (up to 11%) 
is also formed during the Fmoc-cleavage steps under these conditions 
(10 equivalents NEt3), see Supplementary Table 1. Asterisks indicate 
D-labelled Fmoc (red ball) or dibenzofulvene (orange ball). The 
percentage of the D2-labelled Fmoc incorporated into the catenane was 
determined by relative intensity of the [M+ Na]* signal in electrospray 
mass spectrometry. Dashes indicate that the degree of deuterium 
incorporation was not measured at this point. 


detachable group. As with the rotaxane model, the control experiment 
demonstrated that the rate of Fmoc cleavage is not affected by the 
position of the macrocycle in the catenane (Supplementary Fig. 5d), 
that is, Ktar-cleave oa Kelose-cleave: 

To confirm that both the Fmoc addition and cleavage reactions take 
place with catenane 1, we first demonstrated that the reactions can 
occur sequentially (Fig. 3). Catenane 1 was treated with NEt; (8 equiva- 
lents) and after 6h 57% of the catenanes had lost one Fmoc group 


(forming 2/2’; Fig. 3, cycle 1) with a further 11% of catenanes having 
had both Fmoc groups cleaved. At this point deuterium-labelled fuel, 
D2-Fmoc-Cl, activated with (R)-5 was added, leading to almost com- 
plete derivatization of the catenane hydroxyl groups after 24h (94% 1). 
At the end of this cleavage-addition cycle (Fig. 3, cycle 1) electrospray 
mass spectrometry confirmed that the D2-labelled Fmoc groups 
had been incorporated from the fuel into the catenane motor (Fig. 3, 
cycle 1). The resulting mixture was then treated with a second cycle of 
NEts, leading after 5h to 55% of the catenane with only one Fmoc group, 
2/2’ (Fig. 3, cycle 2). Subsequent addition of unlabelled Fmoc-Cl 
regenerated 1 with a majority of the Fmoc groups without deuterium 
labels (Fig. 3, cycle 2). Thus, over two complete operational cycles, the 
catenane molecules are shown to sequentially cleave and then add an 
Fmoc group from the fuel being supplied during that cycle, then cleave 
and add another Fmoc group from a second batch of fuel. 

To monitor the catenane rotary motor during autonomous opera- 
tion, catenane 1 with 80% of the small rings on the unlabelled fumara- 
mide binding site (80:20 FumH2:FumD?-1) was treated with Fmoc-Cl, 
(R)-5, Et3N and KHCO; in CH2Cl, (Fig. 4). For autonomous opera- 
tion we used conditions under which the Fmoc groups are added by 
the Fmoc-Cl fuel and cleaved with no discernible accumulation of 
diol (that is, 1.5 equivalents Et3;N instead of the 8 to 10 equivalents 
employed in the sequential operations) and the distribution of the 
catenane positional isomers was measured over time by 'H NMR spec- 
troscopy (Fig. 4). Under these conditions the initial macrocycle distri- 
bution changed from 80:20 FumH>:FumD>z to 55:45 (Fig. 4b). Shortly 
after the supply of Fmoc-Cl fuel is cut off, no further change in the 
distribution of the rings between the compartments occurs (that is, the 
motor stops working). However, cleavage of the Fmoc groups slowly 
continues, unless the basic reaction medium is quenched, forming 2/2’ 
and eventually catenane diol. 

The ratio of the distribution of the rings between the compartments 
falls towards 1:1 as a direct consequence of the functioning of the 
motor as each Fmoc-cleavage reaction serves to equilibrate the distri- 
bution of rings between the compartments. Although the Fmoc attach- 
ment reaction biases clockwise rotation of the small ring around the 
track, it does not bias its average position on the track. This is because 
Fmoc attachment to catenane 2’ biases the small ring to the left-hand 
compartment, whereas Fmoc attachment to catenane 2 biases the 
small ring to the right-hand compartment. As was demonstrated for 
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Figure 4 | Directional transport of the macrocycle monitored by 

'H NMR spectroscopy. a, Reaction conditions: (R)-5 (5 equivalents), 
KHCO; (20 equivalents), CH2Cl:, room temperature, Fmoc-Cl, CH2Ch, 
added via syringe pump at 2.4 equivalents per hour, then NEt; (1.5 
equivalents) after 1h of Fmoc-Cl addition. b, Partial 'H NMR spectra 
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(500 MHz, CD3Cl):CD30D 1:1, 300 K) of 80:20 FumH2-1:FumD,-1 and 
after operation for 4h, 24h and 48h. The region 5.7-6.3 p.p.m. is scaled 
vertically 6x compared to region 8.3-8.8 p.p.m. The two macrocycle 
positional isomers of catenane 1 each exist as four tertiary amide rotamers. 
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rotaxane 4 (Fig. 2), the change in the macrocycle distribution that 
occurs with catenane 1 under the operating conditions (Fig. 4) shows 
that the Fmoc groups are being cleaved, allowing the small ring to 
move between compartments by Brownian motion, and the transiently 
generated hydroxyl groups are being derivatized under consumption 
of the Fmoc-Cl fuel, that is, confirming that the catenane rotary motor 
operates autonomously as long as unspent chemical fuel is present. 

Proving directional rotation in molecular motors is challenging, 
not least because each rotational cycle returns the motor compo- 
nents to their starting positions. Evidence for directionality in step- 
wise-operated small-molecule motors has previously been provided 
by determining the position of the components at multiple points in a 
motor’s cycle and determining the rates of different pathways to those 
positions'*”°, However, in a continuously operating motor with only 
two minimum energy positions of the components, such as 1, this 
approach is not possible. Nevertheless, fuel-driven directional rotation 
in 1 could be unequivocally established through a series of individually 
provable premises, a form of deductive logic commonly used in math- 
ematical proofs. If all of the premises are experimentally demonstrated 
to be correct, and the terms linking the premises to the conclusion are 
valid, then the conclusion reached is necessarily true. Net directional 
rotation of the chemically fuelled rotary motor 1 through an informa- 
tion ratchet mechanism (see Supplementary Information, section 6, 
for an explanation of how directionality intrinsically follows from the 
rate equations) is demonstrated through experimental verification of 
each of three premises: 

(1) Under the motor operating conditions the Fmoc attachment 
reaction to the catenane (from Fmoc-Cl) and the Fmoc cleavage 
reaction from the catenane (forming CO, and dibenzofulvene) both 
occur. This is shown by the experiments that demonstrate that D.- 
Fmoc groups add to the catenane when using D2-Fmoc fuel and are 
then replaced by unlabelled Fmoc groups upon switching to unlabelled 
fuel (Fig. 3). Fmoc-Cl is not simply being destroyed in the reaction, it 
is being continuously added and cleaved from the catenane under the 
operating conditions. 

(2) Under the motor operating conditions Fmoc attachment to the 
catenane hydroxyl group (2 or 2’) results in a bias in the distribution 
of the macrocycle between the compartments in the resulting di-Fmoc 
catenane (1) (that is, kfar-attach # Kelose-attach). This is shown by the posi- 
tional bias in the Fmoc attachment to 2’ experiments (Supplementary 
Fig. 5a and b), analogous to that shown for rotaxane 3 in Fig. 2a. 

(3) Under the motor operating conditions cleavage of one Fmoc 
group from catenane 1 occurs at a rate independent of the position 
of the macrocycle (that is, kfar-cleave = Kelose-cleave). This is proved by the 
catenane Fmoc cleavage experiments (Supplementary Fig. 5c and d), 
analogous to that shown for rotaxane 20:80 FumH>:FumD,-4 in Fig. 2c. 

The effects of the net-directional movement of the rings around the 
catenane track are directly observed in the experiment shown in Fig. 4. 
The catenane ring distribution can only change through the benzylic 
amide macrocycles shuttling between the fumaramide sites when an 
Fmoc group is transiently cleaved, and the directional bias of the ring 
movement under these conditions is that experimentally determined 
in proving premises (2) and (3). 

Chirality is not necessary for directional rotation: the wheels of 
a bicycle travelling down a road rotate clockwise with respect to an 
observer of one side of the road and counter-clockwise with respect 
to an observer on the other”*. However, the chiral centres of 1 differ- 
entiate the two faces of the track, defining the direction of the ring 
rotation in 1 as clockwise with respect to the (R,R)-stereochemistry 
of the molecular motor. 

Just as motor proteins are catalysts for the hydrolysis of ATP, the 
catenane (1) and rotaxane (4) motors are catalysts for the conversion 
of Fmoc-Cl and Et3N into dibenzofulvene, CO, and Et3NHCI. For 
both the biological and synthetic motors it is the free energy released 
by the motor-catalysed exergonic reactions that drives the directional 
displacement of the motor components. In principle, the consumption 
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of two molecules of Fmoc-Cl is required to power one 360° clockwise 
ratcheted rotation of the benzylic amide macrocycle around the 
catenane track of 1. In practice, the directionality of the fuelled rotation 
is good: the 80:20 positional bias observed in the Fmoc-attachment 
reaction means that for every ten molecules of Fmoc-Cl that react 
with the track the benzylic amide macrocycle makes on average three 
net directional full rotations about the track. However, unlike motor 
proteins”®, rotaxane 3 and catenane 2/2’ are poor catalysts for the 
destruction of their chemical fuel, and 1 and 4 react with base to form 
CO, and dibenzofulvene only a few times faster than the background 
base-promoted decomposition of Fmoc-Cl. 

From the rate at which the ratio of the macrocycle distribution 
between the compartments in the catenane falls to unity, the speed 
of net-directional rotation in the experiment shown in Fig. 4b can be 
calculated to be ~12h for each 360° rotation. This might be increased 
by raising the temperature and/or increasing the concentration and/ 
or rate of addition of the fuel, but changes in these parameters might 
also affect the net directionality of rotation. 

Synthetic chemically fuelled molecular motors 1 and 4 join light- 
driven molecular rotary motors as engines with the potential to power 
tasks in molecular nanotechnology’. Finding ways to link the position 
of the ring to more effective catalytic decomposition of the fuel should 
allow for the development of faster and more efficient small-molecule 
motors powered by chemical fuels. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

General method for autonomous operation of rotary catenane motor 1. To a 
solution of 1 (5 mg, 2.6,1mol) in CH2Cl, (0.3 ml) was added (R)-5 (5 equivalents, 
8.2 mg, 13.0,1mol) and KHCO; (20 equivalents, 5.2 mg, 52|1mol). A solution of 
Fmoc-Cl (240 mg, 0.93 mmol) in CH2Cl; (1.0 ml) was added at a rate of 6.7 wlhot, 
After 1h NEt; (1.5 equivalents, 0.5511, 3.9 smol) was added and Fmoc-C]l addition 
continued at a rate of 6.7 lh~! (6.2 j1molh~! Fmoc-Cl for 2.6 pmol of 1) for as 
long as the motor was required to run. After full consumption of the chemical fuel 
(Fmoc-Cl) the catenane motor was recovered by addition of 1 M HCl (aqueous) 
(10 ml) and the aqueous layer extracted with CH2Cl, (3 x 20 ml). The combined 
organic layers were washed with brine, dried over Na,SO, and concentrated under 
reduced pressure. Purification by preparative thin layer chromatography (SiOz, 
CH2Cl:EtOH 95:5) gave pristine 1. 
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FeO, and FeOOH under deep lower-mantle 
conditions and Earth’s oxygen-hydrogen cycles 


Qingyang Hu!**, Duck Young Kim!*, Wenge Yang)?*, Liuxiang Yang!?, Yue Meng’, Li Zhang? & Ho-Kwang Mao!” 


The distribution, accumulation and circulation of oxygen and 
hydrogen in Earth's interior dictate the geochemical evolution 
of the hydrosphere, atmosphere and biosphere’. The oxygen- 
rich atmosphere and iron-rich core represent two end-members 
of the oxygen-iron (O-Fe) system, overlapping with the entire 
pressure-temperature-composition range of the planet. The 
extreme pressure and temperature conditions of the deep interior 
alter the oxidation states', spin states” and phase stabilities*4 of 
iron oxides, creating new stoichiometries, such as Fe,O; (ref. 5) and 
Fe;Og (ref. 6). Such interactions between O and Fe dictate Earth’s 
formation, the separation of the core and mantle, and the evolution 
of the atmosphere. Iron, in its multiple oxidation states, controls 
the oxygen fugacity and oxygen budget, with hydrogen having a 
key role in the reaction of Fe and O (causing iron to rust in humid 
air). Here we use first-principles calculations and experiments 
to identify a highly stable, pyrite-structured iron oxide (FeO ) at 
76 gigapascals and 1,800 kelvin that holds an excessive amount 
of oxygen. We show that the mineral goethite, FEOOH, which 
exists ubiquitously as ‘rust’ and is concentrated in bog iron ore, 
decomposes under the deep lower-mantle conditions to form FeO. 
and release H2. The reaction could cause accumulation of the heavy 
FeO,-bearing patches in the deep lower mantle, upward migration 
of hydrogen, and separation of the oxygen and hydrogen cycles. 
This process provides an alternative interpretation for the origin 
of seismic and geochemical anomalies in the deep lower mantle, as 
well as a sporadic O, source for the Great Oxidation Event over two 
billion years ago that created the present oxygen-rich atmosphere. 

We started with a-Fe,03 (haematite) powder loaded in cryogeni- 
cally condensed liquid O, in the sample chamber of a diamond-anvil 
cell (DAC) (see Methods). The pressure was initially raised to 78 GPa; 
no reaction between haematite and O2 was observed at ambient tem- 
perature (~293 K). Using a Nd-doped Y3A1;0 2 laser system’ to heat 
the sample to 1,800 K in situ at high pressure, the sample became 
semi-transparent (Fig. 1b), suggesting that a chemical reaction had 
occurred. The X-ray diffraction (XRD) pattern shows new sets of 
sharp, single crystal-like diffraction spots (Fig. 1a) that are readily 
distinguishable from the original broad and smooth texture of the 
Fe,O3 powder pattern. Integration of the diffraction spots (Fig. 2a) 
shows eight peaks that do not match any known Fe2QOs3 (refs 3, 4) or 
O phases’, but can be unambiguously indexed to a rather simple cubic 
structure (Fig. 1c) with the space group Pa3 (Table 1). 

The spotty XRD pattern is ideally suited to the multigrain crystal- 
lography method? recently adopted for high-pressure research’®"’. 
The spots are treated as diffraction from multiple single crystals, and 
sorted according to individual crystal orientation matrices. At least 33 
single crystallites were identified by the multigrain crystallography 
method software. All symmetry-allowed spots for Pa3 are present, and 
all observed spots can be accounted for by the Pa3 unit cell. The details 
of five crystallites are presented in Extended Data Tables 1 and 2. 


The new phase has a structure identical to that of pyrite (FeS) with 
oxygen replacing sulphur, the next-row chalcogen element. Results 
from Rietveld refinement are shown in Fig. 2. For this structure, 
oxygen atoms not only form O-Fe bonds of 1.792 A, but also O-O 
bonds of 1.937 A (Extended Data Fig. 1 and Extended Data Table 3), 
that are typical of peroxide. Analogous to the archetypical pyrite, the 
iron in FeO is considered to be ferrous. Curiously, the oxidation of 
Fe,O3 to FeO, reduces Fe** to Fe?*. This can be understood with the 
concurrent oxidation of O?~ to O?~ and O° as indicated by the O-O 
bond. In other words, this material can be viewed as FeO holding 
extra O>. We shall refer to the pyrite phase of Pa3 peroxide as the 
P-phase. 

To assess the stability of the P-phase under pressure, we calculated 
the volume change of the reaction at 76 GPa as follows: 


2Fe,03 + O7 = 4FeO, (1) 


Here we used molar volumes of 35.69 A? for Fe,O; in the Aba2 
structure’, 12.79 A? for O, in the Og cluster!?, and 20.76 A? for the 
P-phase. The reaction has a volume shrink of AV/V= —1.4%. 
Pressure lowers the Gibbs free energy of the reaction by AG=JAV@P, 
thus favouring the formation of FeO, at increasing pressure. The 
P-phase is non-quenchable to ambient conditions; its XRD peaks dis- 
appear below 31 GPa during pressure release at 300 K (Extended Data 
Fig. 2). 

We expanded our study from the O-Fe binary to the O-Fe-H ter- 
nary, and showed that the P-phase could also be synthesized under 
moderately reducing conditions coexisting with H2. We studied the 
Fe,03-H,0O join, in which the most stable compound FeQOH occurs 
ubiquitously as rust on Earth’s surface, in the deep ocean, on meteor- 
ites, on other planets, and on moons, in the a-, B-, y-, 6-, or €-FeOOH 
forms. It concentrates in bog iron ore deposits, which have been used 
as a copious, renewable resource of iron ever since the Iron Age. The 
a-FeOOH (goethite) transforms to the e-phase at high pressure and 
decomposes to Fe,O3-+ H,0 at high temperature. Its pressure-temper- 
ature phase boundaries and pressure-volume-temperature equations 
of state have been previously determined up to 29.4 GPa and 523 K 
(ref. 13). 

We compressed goethite in Ne pressure medium to 92 GPa and 
laser-heated it to 2,050 K. XRD clearly shows the conversion to the 
P-phase (Fig. 2b), indicating the following reaction: 


2FeOOH = 2FeO, + Hp (2) 


The H; at 92 GPa and 2,050K is far above its melting temperature 
of 900 K (ref. 14), and the Hz fluid is highly mobile. Raman spec- 
troscopy is used to search for H2, and clearly observed H2 vibron 
peaks at 5,180 cm“! (Fig. 3), corresponding to H, in the Ne pressure 
medium’*. The production of free H2 indicates a moderately reducing 
condition. 


1Center for High Pressure Science and Technology Advanced Research (HPSTAR), Shanghai 201203, China. 2Geophysical Laboratory, Carnegie Institution, Washington DC 20015, USA. 3High 
Pressure Synergetic Consortium (HPSynC), Geophysical Laboratory, Carnegie Institution, Argonne, Illinois 60439, USA. “High Pressure Collaborative Access Team (HPCAT), Geophysical Laboratory, 


Carnegie Institution, Argonne, Illinois 60439, USA. 
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Figure 1 | The FeO, phase. a, A two-dimensional XRD image of the 
Fe,03 + O2 experiment at 76 GPa after laser heating, collected at 

w= —8.5°, with X-ray wavelength of 0.6199 A. The original image in 
20-n polar coordination is converted into Cartesian coordinates. The 
newly developed sharp spots are from several single crystals of FeO2, 
while smeared powder rings are from the remaining Fe,O3. Four selected 
diffraction spots and their Miller indices from the P-phase are shown 
below. Their observed Bragg angles (20), rotation angles (w) and azimuthal 
angles (7) are listed in Extended Data Table 1. b, A microphotographic 
image of FeO, through diamond culets. c, Structural representation of the 
pyrite-type FeOp. 


Asa chemical equivalent of FexO;-+ H,O, the goethite reaction (2) 
demonstrates that the water in deep Earth'® can provide an abundant 
source of Og, that is: 


Fe,03 + H,O = 2FeO, +H, (3) 


Although the exact quantity of H2O in the mantle is uncertain, the 
existence of H2O there in hydrates or other forms is well accepted!”~. 

We were initially motivated by our computational predictions. 
A first-principles-based structure-searching algorithm”” allows us to 
investigate the energy landscape of Fe-O compounds under pressure. 
At 100 GPa and 300 GPa, we conducted extensive prediction of Fe-O 
compounds using this model, and FeO, appears as one of the most 
stable phases. In the convex hull curve (Fig. 4a and b), FeO2 stands out 
as energetically extremely favourable (the deepest hull). Our phonon 
calculations of FeO, show stable phonons at all pressure (Fig. 4c). 
The experimentally observed structural parameters of the P-phase 
FeQOz agree exactly with the ab initio prediction (Table 1). Electron 
localization function calculations and Bader analysis shed light on 
the chemical bonding nature of FeO2. Valence electrons near anionic 
O are highly localized and electron localization function minima are 
located between Fe and O atoms. 

The P-phase has been previously predicted at 100-465 GPa by a 
computational search for high-pressure iron oxides”". It is clearly one 
of the most prominent phases in the convex hull curve at all of the 
calculated pressures, which are thought to occur at Earth's centre. 

Our experiments and theoretical calculations demonstrate that if 
the surface assemblage FEOOH or Fe.03+ HO is thrust deeper than 
1,800 km (into the deep lower mantle), it will form the P-phase. The 
frequent occurrence of such assemblages in the down-going slabs 
suggests that reaction (2) could have started as early as the accre- 
tion of the early Earth from planetesimals of assorted compositions. 
The water-rich and iron-oxide-rich fragments would release Hz 
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Figure 2 | Integrated XRD pattern and Rietveld refinement of the 
pyrite-type FeO, phase. a, FeO, synthesized from Fe,O3 and O; at 76 GPa 
with R indices R, = 0.0663 and wR; = 0.171. Blue stars belong to the 

Aba2 phase of FeO3. We attribute the residual peaks to post-perovskite- 
type (pPv) Fe.03 and O>. b, FeO synthesized from FeOOH at 92 GPa, 
with R indices, Rj =0.0541 and wR, = 0.167. Red stars refer to residual 
e-FeOOH. Similar results are reproduced in multiple experiments for each 
composition. 


and convert to the P-phase when pressure exceeds 76 GPa. With its 
high density (7.026 gcm~? at 76 GPa) in comparison to the density 
(5gcm7*) of the mantle according to the Preliminary Reference 
Earth Model??, the P-phase would normally settle at depth, while 
the light and mobile hydrogen would diffuse, infiltrate, or react to 
form other volatiles, and work its way up to complete the hydrogen 
cycle. A portion of the H, might eventually escape into space. Plate 
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Figure 3 | Raman peak of the hydrogen Q, vibron in dense neon. 

Data points (blue circles) below 45 GPa and the top inset are taken from 
Loubeyre’s hydrogen vibron measurements of H) in the Ne matrix!>. The 
blue dashed curve is the fourth-order polynomial fitting given by ref. 15. 
Data points (red squares) above 80 GPa and the bottom inset are taken 
from the present results of our FEOOH experiment after laser heating 
and during decompression to 80 GPa. The sharp hydrogen vibron peak at 
5,180-5,170cm! clearly indicates the hydrogen Hz in the neon matrix. 
Errors of frequency are calculated from the full-width half-maximum 

of the Raman peak. Pressure uncertainty is derived from multiple 
measurements of diamond line shifts at the centre of the culet. 
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Figure 4 | Crystal structure search results. a, 100 GPa; b, 300 GPa. The 
convex hull graph referenced by a dashed line shows each compound’s 
stability with respect to decomposition into individual Fe and O elements. 
Solid squares show the formation enthalpy of each compound possessing 
the lowest enthalpy within fixed stoichiometry and chemical formula (red 


tectonics would have continued to supply FEOOH, Fe203, and H,O 
to the down-going slabs, thus accumulating P-phase to form patches 
of oxygen reservoirs in the deep lower mantle and at the same time 
sustaining the hydrogen cycle. The high mobility and rapid cycling of 
hydrogen would help to build up substantial patches of the P-phase, 
possibly detectable as seismic anomalies in the D” layer and other 
deep lower-mantle regions. 

Occasionally, the P-phase-rich patches could have been swept up 
by plumes or other large-scale mantle dynamic processes. Once they 
reached the middle lower mantle at depths of less than 1,500km, the 
P-phase would decompose according to the reversal of equation (1) 
to provide a sporadic source of extra Op. 

The idea of such P-phase-rich patches is in harmony with the gen- 
eral concept of an overall reducing lower mantle. First, reaction (2) 
shows that FeO, can coexist with Ho, indicating a moderately reducing 
condition in spite of its high oxygen content. Second, our knowledge 
of oxygen fugacity in the deep Earth is based on very limited sam- 
pling from the lower mantle (with none from the deep lower mantle). 
Sampling is chemically selective towards cratonic lithosphere”. 
Considering the geochemical diversity of the mantle”, the major 


Table 1 | Indexed peaks of the XRD pattern from the P-phase at 
76GPa 


hkl obs (A) cal (A) Ad (A) simu (A) 
111 2.5190(1) 2.5189 —0.0001 2.500 
200 2.1817(5) 2.1814 —0.0003 2.165 
210 1.9508(3) 9511 0.0003 1.937 
211 1.7821(15) 7811 —0.0010 1.768 
220 1.5421(4) 5425 0.0004 1.531 
311 1.3151(5) 3154 0.0003 1.306 
222 1.2600(7) .2594 —0.0006 1.250 
320 1.2093(12) .2100 0.0007 1.201 

The lattice parameter is calculated to be a=4.3628(1)A. The wavelength of the synchrotron X-ray 

is 0.4344A. The d-spacings at 76 GPa are as follows: dops are the observed peak positions; dea is 

calculated from the averaged lattice parameter; Ad =deai— obs; and dsimu is the first-principles 


simulation. The estimated standard deviations are included in parentheses. 
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line is a guide to the eye). c, Phonon dispersion relations sampling along 
high-symmetry points in the Brillouin zone ((-X-M-I°-R) at 76 GPa. 
d, The electron localization function distribution modelled in the FeO, 
crystal lattice. 


elements can vary greatly in regions of subduction crust”®, upwelling 
plumes”®, Dp” layer’, and many other small patches. Solid diffusion is 
extremely inefficient”’, and cannot eliminate the oxygen inhomoge- 
neity between a P-phase-rich patch and the adjacent rocks kilometres 
away, even at the mantle temperatures occurring through geological 
time. The oxygen fugacity in the deep lower mantle is probably as 
inhomogeneous as that in the crust, which ranges from the oxygen 
fugacity of the iron-wiistite buffer”? to that of free O. in the air. The 
oxygen-rich P-phase could exist in pockets, locally or regionally. 

The scenarios of spatial and temporal heterogeneity of oxygen in the 
deep lower mantle have far-reaching implications. The oxygen-rich 
patches may lead to new phase assemblages with very different min- 
eralogical, chemical and physical signatures from the nominally 
bridgmanite-ferropericlase mantle. The new phase assemblages may 
be responsible for many unexplained seismic and geochemical anom- 
alies in the deep lower mantle and the D” layer. The new scenario 
introduces a complex picture of the deep lower mantle that calls for 
in-depth study of the P-phase. 

The Great Oxidation Event marked the permanent rise of the O2 
level in the atmosphere, which did not previously contain free Oy. It 
is thought to have occurred 2.42.1 billion years ago”, on the basis 
of evidence such as the appearance of highly oxidized red soil, the 
disappearance of easily oxidized FeS, pyrite*!, and the disappearance 
of distinctive non-mass-dependent sulphur isotope fractionations*. 
In addition to the proposed biogenic origin of the O2, the emergence 
of P-phase-bearing patches could provide an extra, eventful, abiotic 
source of Oo. Whether the strong uprising of the P-phase patches 
2.4-2.1 billion years ago was an accidental, sporadic event or was 
triggered by some geodynamic instability would be very interesting 
to know. Exploration of these hypotheses, however, requires further 
investigations of the physical, chemical, and mineralogical properties 
of the P-phase under the deep lower-mantle conditions. 

Online Content Methods, along with any additional Extended Data display items and 


Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

XRD of haematite in O2. Angular dispersive XRD experiments were performed at 
the 16-BMD and 16-IDB stations of the High-Pressure Collaborative Access Team 
(HPCAT) and the 13BM-C station of the GeoSoilEnviroCARS, at the Advanced 
Phonon Source, Argonne National Laboratory. 

High-purity Fe,O3 powders from Alfa Aesar (MFCD00011008, 99.99% 
purity) were annealed at 1,296 K for 12h to eliminate the absorbed water. The 
dehydrated samples were compressed into patties of size 60j1m (length) x 601m 
(width) x 101m (thickness), and loaded in a DAC. We use diamond anvils with 
culets of 200|1m to access pressures above 70 GPa. The sample chamber was a 
100-jum-diameter hole drilled in a pre-compressed rhenium gasket. The DAC was 
placed in a sealed container immersed in liquid nitrogen. O2 gas was piped into 
the container. Liquefied O2 infused into the sample chamber, and served as both 
oxidant and pressure medium after the chamber was sealed by compression. The 
initial pressure after loading O2was 8 GPa. 

The compression rate from 8 to 78 GPa was as slow as 5 GPa per hour. During 
compression, the Raman signals from samples reproduce the literature results of 
FeO; (ref. 33) and solid O) (ref. 8). We checked the structure of compressed Fe,O3 
at 78 GPa, and verified that the diffraction pattern agreed with the Rh)O3-type 
Fe,03 phase**. The sample was then heated by a double-sided laser system’ to 
promote chemical reaction. The heating temperature reached 1,800 K on both 
sides of the sample, measured by fitting the black-body radiation curve. After 
laser heating and quenching to ambient temperature, the sample pressure was 
equilibrated at 76 GPa. 

To confirm the synthesis conditions for FeO2, we implemented separate runs 
at different pressures (56-81 GPa) using the same assemblage. We gradually 
increase the laser-heating power and monitor the change of XRD pattern. FeO 
was observed above 75 GPa when the temperature was raised above 1,600 K, which 
represented the kinetic barrier of the reaction. FeO was not observed at pressures 
below 72 GPa even at temperatures as high as 2,050 K, thus defining the lower 
stability limit of FeO (Extended Data Fig. 3a). 

Diffraction patterns performed at 16-BMD (HPCAT) with a monochromatic 
X-ray energy of 40.1 keV were collected on a Mar 345 image plate detector while the 
DAC was rotated so that the sample-beam angle varied from —7° to 7°. The P-phase 
is identified (Fig. 2). We performed additional XRD experiments at 13BM-C with 
rotation angle —13.0° to 12.5° and X-ray energy of 28.5 keV. Observation of the 
P-phase was well reproduced with the same experimental conditions. Pressure was 
determined by calibrating the derivative shift of the diamond Raman mode in an 
offline Raman system, confirmed by the vibron mode frequency of ¢-O) (ref. 8). 
The pressure uncertainty is up to +3 GPa, derived from the difference between 
diamond line shifts and O2 vibron mode shifts. 

Decompress FeO . The sample was decompressed to the ambient condition from 
76 GPa at a rate of 8 GPa per hour. The P-phase was recognized in diffraction pat- 
terns until 41 GPa, where the signature (111) and (200) becomes weak in intensity. 
The FeO; structure is eventually invisible at 31 GPa (Extended Data Fig. 2). Three 
newly emerged peaks are possibly associated with the low-pressure haematite. 
Multigrain single-crystal XRD. To identify the new unit cell of an unknown 
phase, the conventional powder XRD method uses only the Miller indices—Bragg 
angle relationship (hkl-2@) and the answer is often non-unique. Like the sin- 
gle-crystal XRD method, the multigrain crystallography method requires all 
observable Miller indices simultaneously to satisfy the stringent geometrical rela- 
tion among 20, w (rotation axis perpendicular to the incident X-ray beam), and 
1 (rotation axis parallel to the incident X-ray beam) within a tight uncertainty 
range, so the unit-cell assignment is absolutely definitive. In addition, the multi- 
grain crystallography method has the statistical advantage over the single-crystal 
XRD owing to the number of multiple crystals and the coverage advantage ena- 
bling the access of full orientations that can be blocked for a single crystal by the 
limited DAC opening. 

Multigrain single-crystal XRD experiments were implemented at 
BL15U1 station, the Shanghai Synchrotron Radiation Facility and 13BM-C, 
GeoSoilEnviroCARS. At Shanghai Synchrotron Radiation Facility, diffraction 
patterns were collected on a charge-couple device detector with nano-focusing 
incident beam (beam size of 1.5\1m x 21m) and beam energy of 20.0keV. A total 
number of 50 images were collected by rotating the sample beam angle from —12.5° 
to 12.5°, with a scanning step of 0.5°. For experiments performed at 13BM-C, the 
incident X-ray is 151m x 15m in beam size and 28.5 keV in energy. The sample 
beam angle is —13.0° to 12.5°. The diffraction peak indexing and d-spacing for 
five selected FeO, crystal grains are summarized in the Extended Data Table la-e 
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respectively. A short summary of the multigrain diffraction results is shown in 
Extended Data Table 2. 

XRD experiment on FeOOH. For the FEOOH experiment, angular dispersive 
XRD experiments were performed at the 16-IDB station of HPCAT and the 
13BM-C station of GeoSoilEnviroCARS. High-purity a-FeOOH powders were 
purchased from Alfa Aesar (MFCD00064782, 99+% purity). The powder sam- 
ple was pre-compressed to 351m (length) x 354m (width) x 10jm (thickness) 
and quickly loaded into the DAC without annealing. Diamond anvils with culet 
diameter of 150 1m (bevelled from 300 1m) were used to reach high pressures. The 
sample was placed in a hole of diameter 95 1m, and sealed with rhenium gasket. 
Neon was loaded as hydrostatic pressure medium and thermal insulator. The pres- 
sure after gas-loading was 0.9 GPa, derived from the lineshift of ruby fluorescence. 

During the compression process, pressure was first calibrated by ruby fluores- 
cence and cross-checked by diffraction peaks of solid neon** and the rhenium 
gasket edge. At higher pressures, while ruby fluorescence modes were hard to 
detect, the lowest derivative value of diamond Raman peak was taken as additional 
calibration. After laser heating, diffraction peaks of neon were shifted, owing to 
the presence of hydrogen, and could not be used as a reliable pressure calibrate. 
Therefore only the lowest derivative of the diamond Raman peak and diffraction 
pattern taken at the edge of rhenium gasket were considered for pressure calibra- 
tion. The pressure uncertainty is as large as +5 GPa. 

XRD patterns were collected at stations 13BM-C of GeoSoilEnviroCARS and 
16-IDB of HPCAT. FeOOH powder on Ne was compressed to 92 GPa (compres- 
sion rate ~10 GPa per hour). The compressed sample was laser heated for 10 min. 
Sample pressure decreased to 87 GPa after quenching to ambient temperature. A 
Raman peak around 5,180 cm”! (Fig. 3) associated with the Q; vibron of hydrogen 
was observed, showing the diffusion of hydrogen into neon crystals'**. A portion 
of the heated sample had transformed into FeO, (Fig. 2b and Extended Data Fig. 4). 

Additional in situ laser heating experiments were conducted to investigate 
the stability field of FEOOH. From 58-114 GPa, the sample was heated by a dou- 
ble-sided laser and we probe the P-phase by both XRD and Raman spectroscopy. 
We constrained the decomposition pressure of FEOOH between 78 GPa and 87 GPa 
(Extended Data Fig. 3b). 

Ab initio crystal structure searching. The first-principles calculations were per- 
formed in the framework of density functional theory***” through package VASP**. 
The generalized gradient approximation of Perdew, Burke, and Ernzerhof was 
implemented to describe the exchange correlation functions***°. Pseudopotentials 
were used with eight valence electrons for Fe atoms (3d’4s') and six for O atoms 
(2s°2p*). Energy convergence and k-points mesh. For the crystal structure search, 
we used USPEX” with a plane-wave basis set cutoff energy of 800 eV. 

Phonon dispersion curves from first-principles. Phonon calculations were con- 
ducted based on density functional perturbation theory*! implemented in VASP 
software in connection with the Phonopy software’. 8 x 8 x 8 q-mesh (phonon 
momentum space) was used for mapping a 2 x 2 x 2 supercell of FeO). We found 
that phonon dispersions are also stable at 0 GPa and 300 GPa, as shown in Extended 
Data Fig. 5. 
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Extended Data Figure 1 | Bonding lengths and angles in pyrite- 
type FeO, at 76 GPa. The structure is viewed along the x axis of the 
experimental FeO, unit cell. 
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Extended Data Figure 2 | XRD pattern series by decompressing the P-phase. The P-phase becomes weak in intensity at 41 GPa and totally disappears 
at 31 GPa and below. The decompressed sample eventually recovered to the «-Fe203 phase at low pressure. P indicates the P-phase FeO,; O indicates 
solid O.; H indicates a-Fe2O3 (haematite). FexO3 contains post-perovskite type and Aba2-structured high-pressure phases. 
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Extended Data Figure 3 | Synthesis pressure-temperature conditions Decomposition pressure was constrained between 78 GPa and 87 GPa. 
for FeO. a, Open circles indicate the coexistence of FexO3 and Op. Sample temperature was measured using the spectroradiometric method 


Solid squares indicate the appearance of P-phase FeO2. FeO was and errors are estimated from the goodness of fit to the spectroradiometric 
synthesized between 72 GPa and 75 GPa. b, Open circles indicate FEOOH. profile. 
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Extended Data Figure 4 | XRD patterns of FEOOH through the d, After laser-heating and quenching to ambient temperature, the pressure 
experimental pressure-temperature path. a, Goethite sample (G) in dropped to 87 GPa, and the goethite peaks disappeared. The new pattern 
neon and Re gasket at 0.9 GPa. b, As in a, compressed to 35 GPa. consists of peaks of the P-phase, Ne and minor amounts of ¢e-FeOOH 


The XRD peaks include goethite, solidified Ne, and the Re gasket. c, As in (red stars). 
a, compressed to 92 GPa. The sample peaks remain and shift to higher Q. 
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Extended Data Figure 5 | Phonon dispersion relations of FeO P-phase. a, At 0 GPa; b, At 300 GPa. The P-phase is mechanically stable at 0 GPa and 
300 GPa. 
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Extended Data Table 1 | Additional multigrain XRD data was obtained at 13BM-C 


a 
h k LT 20) @(°) (2) doss(A) dea (A) Add 
0 2 O 11.43 -11.1 30.5 2.181 2.181 0.000 
0 2 O 11.44 11.3 2104 2.181 2.181 0.000 
0 2 #1 12.79 69 5.9 1.952 1.951 0.001 
-l -1 -3 19.03 -7.5 103.5 1.314 1315 0.000 
1 1 3 19.02 12.0 283.5 1315 1.315 0.000 
b 
h k FT 260) @(°) 0°) dors (A) dear (A) Ad/d 
-l1 1 1 9.88 49 281.5 2.522 2.520 0.001 
1 -l -1 989 -5.2 101.3 2.520 2.520 0.000 
0 2 0 1142 -5.9 463 2.183 2.183 0.000 
0 2 0 11.42 98 226.3 2.183 2.183 0.000 
2 0 2 16.18 10.0 317.2 1.541 1.543 -—0.001 
c 
h k | 200) 3 @ (°) dovs(A) dea (A) Ad/d 
-l -l 1 989 -$6 825 2.520 2.520 0.000 
1 1 -l 988 1.4 262.7 2.521 2.520 0.001 
-l -2 0 12.78 0.5 1208 1.952 1.952 0.000 
2 2 -2 19.85 64 262.5 1.260 1.260 0.000 
d 
h k IT 20(°) @() (0) dos (A) dear (A) Adid 
-l 1-1 989 0.0 225.1 2.520 2.519 0.000 
0 -2 1 12.78 -49 83.7 1.952 1.951 0.001 
0 2 -1 12.79 80 263.7. 1.950 1.951 0.000 
-1 -3 1 19.01 -5.2 103.0 1.315 1.315 0.000 
e 
h k 1 200) @ 0°) doss(A) dea (A) Add 
1-1 -l 99 34 143.0 2517 2.518 0.000 
-l -2 0 12.79 -7.5 68.5 1.950 1.950 0.000 
1 2 0 128 65 2486 1.949 1.950 -0.001 
-1 301 19 60 296.5 1.316 1315 0.001 
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a, Single-crystal XRD peaks for a selected P-phase crystalline with unit-cell parameter a = 4.3618(6) A at 76 GPa (space group Pa3, incident 
beam wavelength \=0.4344 A). The Bragg angle 20, rotation angle w and azimuthal angle 7 are calculated from the orientation matrix. 
b-e, XRD indexing for the second, third, fourth and fifth P-phase crystalline, with unit-cell parameters a= 4.3649(8) A, 4.3640(8) A, 
a=4.3625(7) A, a=4.3616(8) A, respectively. The d-spacings are as follows: dops are the observed peak positions, d,q) are calculated from 
lattice parameters, Ad = dobs — dca). The estimated standard deviations are included in parentheses. 
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Extended Data Table 2 | The lattice parameters and atomic positions of pyrite-type FeO2 at 76GPa 


a (A) Volume (A*) 


Grain] 4.3618(6) $2.98(4) 
Grain2 4.3649(8) $3.16(6) 
Grain3 4.3640(8) 83.11(5) 
Grain4 4.3625(7) 83.03(4) 
Grain5 4.3616(8) 82.97(6) 
Powder 4.3628(1) $3.04(1) 
Simulation 4.331 $1.23 


Lattice parameters are calculated from five selected grains with X-ray wavelength 0.4344 A. Diffraction pattern is treated and averaged as powder rings 
for Rietveld refinements. The Fe atoms are in 4a Wyckoff positions with fractional coordinates (0.0, 0.0, 0.0) (1/2, 0, 1/2) (0, 1/2, 1/2) (1/2, 1/2, 0) 

and oxygen in 8c positions (u, u, u), (—u + 1/2, —u, u+ 1/2), (-u, u + 1/2, -u + 1/2), (u + 1/2, —u + 1/2, —u), (—u, —u, —u), (u + 1/2, u, —u + 1/2), 

(u, —u + 1/2, u + 1/2), (—u + 1/2, u + 1/2, u) with u=0.3746(1). Simulation lattice parameters are extracted from the lowest-energy structure search 
by USPEX. 
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Extended Data Table 3 | Bonding lengths and angles in pyrite-type FeO2 at 76 GPa 
Experiment Simulation 


Pressure (GPa) 76 76 
Temperature (K) 297 0) 
<Fe-O> (A) 1.792(5) 1.781 
<O-O> (A) 1.937(11) 2.077 
Bond angle: O-Fe-O (°) 95.6(1) 96.52 
Bond angle: Fe-O-O (°) 99.1(3) 96.82 


Both experimental and computational bonding information is summarized. Uncertainties are calculated from structural refinement. 
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Homo floresiensis-like fossils from the early Middle 


Pleistocene of Flores 


Gerrit D. van den Bergh!*, Yousuke Kaifu*, Iwan Kurniawan’*, Reiko T. Kono*, Adam Brumm*"”, Erick Setiyabudi’, 


Fachroel Aziz® & Michael J. Morwood! 


The evolutionary origin of Homo floresiensis, a diminutive hominin 
species previously known only by skeletal remains from Liang Bua 
in western Flores, Indonesia, has been intensively debated. It is 
a matter of controversy whether this primitive form, dated to 
the Late Pleistocene, evolved from early Asian Homo erectus and 
represents a unique and striking case of evolutionary reversal in 
hominin body and brain size within an insular environment!*. 
The alternative hypothesis is that H. floresiensis derived from 
an older, smaller-brained member of our genus, such as Homo 
habilis, or perhaps even late Australopithecus, signalling a hitherto 
undocumented dispersal of hominins from Africa into eastern 
Asia by two million years ago (2 Ma)>°. Here we describe hominin 
fossils excavated in 2014 from an early Middle Pleistocene site 
(Mata Menge) in the Soa Basin of central Flores. These specimens 
comprise a mandible fragment and six isolated teeth belonging to 
at least three small-jawed and small-toothed individuals. Dating to 
~0.7 Ma, these fossils now constitute the oldest hominin remains 
from Flores’. The Mata Menge mandible and teeth are similar 
in dimensions and morphological characteristics to those of 
H. floresiensis from Liang Bua. The exception is the mandibular 
first molar, which retains a more primitive condition. Notably, 
the Mata Menge mandible and molar are even smaller in size 
than those of the two existing H. floresiensis individuals from 
Liang Bua. The Mata Menge fossils are derived compared with 
Australopithecus and H. habilis, and so tend to support the 
view that H. floresiensis is a dwarfed descendent of early Asian 
H. erectus. Our findings suggest that hominins on Flores had 
acquired extremely small body size and other morphological traits 
specific to H. floresiensis at an unexpectedly early time. 

This paper reports morphological analyses of hominin fossil 
materials excavated from the open site of Mata Menge in 2014 
(ref. 7) (Extended Data Table 1). Mata Menge is one of the Middle 
Pleistocene fossil-bearing localities in the So’a Basin, and is situated 
74km east-southeast of Liang Bua. The specimens (n =7) under study 
were recovered in situ from the upper part of a lens-shaped fluvial 
sandstone unit (Layer IT) measuring up to 30 cm in thickness. Layer I 
is capped by a 6.5m thick sequence of clay-rich volcanic mudflows 
(Layer Ia—f) that filled in the stream valley and effectively sealed off 
Layer I] (ref. 7). All hominin fossils were excavated within a maximum 
linear distance of 15m. They are associated with stone tools and the 
fossil remains of dwarfed proboscideans (Stegodon florensis), murine 
rodents, Komodo dragons, and other insular fauna of Flores. The age 
of Layer II is constrained to between 0.65 and 0.8 Ma, using “°Ar/*°Ar 
dating and other methods of age determination’. The hominin fossils 
display some minor dissolution pitting; generally, however, the surface 
preservation of these specimens is quite good. 


SOA-MM4 is a right mandibular corpus (Fig. 1). Despite its small 
size, we conclude that this partial mandible comes from an adult indi- 
vidual, and that the preserved alveoli represent M; to M3. Only the 
lingual wall of the mesial alveolus remains for M, (Extended Data 
Fig. 1a). This is not for P; because the mandibular canal that normally 
exits in the area below P3—Mj of a hominin mandible further con- 
tinues anteriorly beyond this level (Extended Data Fig. 1c, h). Micro 
computed tomography (CT) scan data indicates that the alveolus for 
the last molar supported a plate-like mesial root and a conical distal 
root which together tilt distally, a form typical for a hominin M3 root 
(Extended Data Fig. 1h, i). Distally to it, the alveolar bone bears no 
evidence of a tooth germ. The bottoms of the long M3 alveoli come 
close to the mandibular canal and display tapering shapes, indicating 
that its root formation was fully or at least nearly completed. 

The lateral corpus is the smallest in our sample, being 21-28% lower 
and narrower than in the two existing H. floresiensis mandibles from 
Liang Bua (LB1, LB6/1: Extended Data Fig. 2a, b). The lateral corpo- 
ral surface of SOA-MM4 is damaged, but its cross-sectional shape 
(Fig. 1d, Extended Data Fig. le) clearly indicates the absence of an 
Australopithecus-like hollow, and the presence of a prominent supe- 
rior lateral torus, a feature characteristic of Homo®®. Mandibles of 
Australopithecus and to a lesser extent those of H. habilis sensu lato are 
characterized by a robust and strongly everted lateral corpus as well 
as a wide extramolar sulcus, in association with their narrow dental 
arcades and the resultant horizontal separation between the lateral 
mandibular corpus and the ramus’”!". These features are lacking in 
SOA-MM4, which has a comparatively thin, vertically oriented lat- 
eral corpus with a narrow extramolar sulcus that is evident from the 
medially located anterior ramus root (Extended Data Fig. 1). Such fea- 
tures became apparent in post-1.7-million-year-old (Myr old) Homo, 
including early Javanese H. erectus and H. floresiensis (Extended Data 
Fig. 3). Similarities between SOA-MM4 and the corresponding mor- 
phology of H. floresiensis extend to other features such as the near 
parallel alveolar margin and mandibular base, a moderate lateral 
prominence, and a gently hollowed masseteric fossa with a coarse, 
curved line for the masseter muscle attachment (Extended Data 
Fig. 4). Multivariate analyses based on the small number of the availa- 
ble linear measurements also support our hypothesis that SOA-MM4 
is at least different from Au. afarensis, and is similar to H. floresiensis 
in the corpus shape (Extended Data Fig. 5). 

The 2014 fossil assemblage from Mata Menge includes six isolated 
hominin teeth from three or more individuals (Fig. 2; Extended 
Data Fig. 1j,k; Extended Data Table 1; Supplementary Information). 
Crown and root measurements available from three permanent teeth 
(left I', right P?, and left M; (or M2)') are small and similar to or 
slightly smaller than those of H. floresiensis (Extended Data Table 2, 
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Figure 1 | SOA-MM4 mandible compared with a Liang Bua 

H. floresiensis specimen. a—d, Superior (a), lateral (b), inferior (c), and 
anterior (d) views. e, Lateral view of the LB6/1 mandible. Mj, first molar; 
Mb», second molar; M3, third molar; MC, mandibular canal; aMF, accessory 
mental foramen. Scale bar, 10 mm. 


Extended Data Fig. 2c-f). The broken root of the 1,2 is also equally 
small (Extended Data Fig. 1k), although comparative measurements 
are unavailable from this specimen, which was used for direct urani- 
um-series dating’. Morphologically, the Mata Menge teeth display the 
following primitive features: (i) a lingually (I! 112) or distally (P%) bev- 
eled, worn occlusal surface that suggests tilted anterior dentition and 
substantial prognathism (Extended Data Fig. 1)); (ii) a pronounced 
P3 lingual cusp whose mesiodistal diameter compares with that of the 
buccal cusp!*!*; and (iii) a mid-trigonid crest on M;. These features 
are frequently observed in Early Pleistocene African and Eurasian 
Homo (that is, H. habilis sensu lato and H. erectus sensu lato), and the 
third character became frequent in H. erectus and some later groups 
of archaic Homo*'». Liang Bua H. floresiensis shares the first and prob- 
ably the third characteristics, although the second is not evident on 
the worn Liang Bua premolars!®. Most features of the Mata Menge 
I’ and P? are not useful for assessing taxonomic affinities relative to 
H. habilis or H. erectus (Supplementary Information), although the 
absence of the P? buccal groove is a condition appeared in post-habilis 
grade Homo?. The Mata Menge and Liang Bua hominins also share a 
bifurcated, fused P? root form. 

We digitally reconstructed the broken M, (or M2) crown 
(SOA-MM1) based on its micro-CT scan (Fig. 3a). Both linear metric 
and crown contour analyses of the M;s showed that this five-cusped 
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Figure 2 | Isolated teeth from Mata Menge. a, SOA-MM2 (left I'). 

b, SOA-MMS (right P?). c, SOA-MM1 (left Mj). d, SOA-MM7 (left d,). 
e, SOA-MMB (right d,). In each row, from left to right, occlusal, buccal 
(labial), lingual, mesial, and distal (except for c) views. Scale bar, 10 mm. 


tooth is moderately long and is close to the average M, shape of early 
Javanese H. erectus, but is different from the elongated H. habilis-like 
forms!” (Fig. 3b, Extended Data Fig. 2e). SOA-MM1 lacks two of the 
most peculiar, derived characteristics of the Liang Bua H. floresiensis 
Ms: a reduced cusp number (five to four) and a MD shortened crown 
configuration®. The above comparative morphology remains largely 
the same even if SOA-MM1 is a M;, although these analyses do not 
clearly separate H. floresiensis from early Javanese H. erectus (Fig. 3c; 
Extended Data Fig. 2f). 

The two deciduous canines (d.s) from Mata Menge are 
much smaller than H. sapiens (n= 63), H. erectus (n=1), and 
Australopithecus (n= 6), but do not display the relatively high crown 
shape that characterizes the latter genus (Extended Data Fig. 6a, b). 
In a principal component analysis (PCA) based on five size- 
adjusted linear measurements (Extended Data Fig. 6c-e), PC1 sepa- 
rates Australopithecus-like primitive (a high crown with a low distal 
shoulder) and modern human-like derived (a low crown with a high 
distal shoulder) morphologies. Allometry does not explain this inter- 
taxon difference because the d, crown sizes are similar between the 
two taxa. SOA-MM/7, the minimally worn d. from Mata Menge, is 
positioned in between Australopithecus and H. sapiens in PC1. There 
are no deciduous teeth in the existing H. floresiensis assemblage from 
Liang Bua. 
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Figure 3 | CT-based reconstruction of SOA-MM1 and the results of 

Elliptic Fourier Analysis of the molar crown contour. a, Occlusal (left) 

and buccal (right) views of the reconstructed SOA-MM1, and a horizontal 

CT section (central) at the level indicated in the buccal view. Scale bar, 

5mm. Plots of the PC scores for the first molar (b) and the second molar (c) 


analyses. Proportions of the variance explained by each PC is in the 
parentheses. 


The above findings shed new light on the origin and evolution of 
Late Pleistocene H. floresiensis. Notably, the 0.7-Myr-old Mata Menge 
hominins are similar to Late Pleistocene H. floresiensis of Liang Bua 
in dentognathic size and morphology, but the former lacks several 
derived molar morphologies of the latter. This suggests that the early 
Middle Pleistocene hominins of the So’a Basin were directly ancestral 
to Liang Bua H. floresiensis. Further support for this view is provided 
by the following observations: (i) stone technologies at Mata Menge 
and Liang Bua are markedly similar, implying a period of techno- 
logical continuity spanning at least several hundred millennia’’; 
(ii) there is no evidence for a faunal turnover during the time interval 
separating the fossil records of the So’a Basin and Liang Bua!’; and 
(iii) H. floresiensis lacks a series of derived cranial features of chrono- 
logically late H. erectus from Java, such as specimens from Ngandong, 
Sambungmacan, and Ngawi (all of which are presumably from the 
late Middle to Late Pleistocene period)*”°. We conclude that the most 
reasonable taxonomic assignment for the Mata Menge fossils is to 
H. floresiensis, although this remains a provisional interpretation until 
new skeletal materials are found. 

Concerning the origins and evolutionary relationships of 
H. floresiensis, we note that the Mata Menge mandible and teeth 
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are morphologically derived compared with Australopithecus and 
H. habilis, with their primitive aspects comparable to post-habilis grade 
Early Pleistocene Homo. This is most consistent with the hypothesis 
that H. floresiensis originated from a population whose closest affin- 
ities are with early Javanese H. erectus (>1.2-0.8 Ma), whose femoral 
length is 55-61% longer, and absolute brain size about twice as large, 
as H. floresiensis*!-?3, Additional support for this includes reports 
that the earliest evidence for hominins on Flores (~1.0 Ma)** does 
not exceed the oldest record of H. erectus on Java (>1.2 Ma)>”®, and 
recent detailed analyses of the craniodental morphology of Liang Bua 
H. floresiensis®*'°, Given how little is known about the distribution 
of early H. erectus on the ancient ‘Sunda shelf, it remains an open 
question whether the founding population crossed to Flores in a west- 
to-east direction from Java, or via a northern route from the Wallacean 
island of Sulawesi?”~”’. 

It is noteworthy that the mandible and teeth from Mata Menge are 
slightly smaller than the two H. floresiensis individuals from Liang Bua 
(LB1 and LB6/1). While this could indicate a slight body size increase 
over time, it may also simply reflect intra-population variation in the 
Mata Menge and Liang Bua hominin groups. Whichever the case, it 
would appear that the Flores hominins had acquired extremely small 
dentognathic size during the time span of at least 300 millennia follow- 
ing the initial colonization of Flores, assuming that the oldest artefacts 
from Flores—dated to at least ~1 Ma7+—were produced by large- 
bodied ancestors of the Mata Menge hominins. This apparently very 
fast transformation in hominin body size is surprising. Although no 
other documented examples of rapid island dwarfing exist for pri- 
mates, we note that red deer from the island of Jersey had reduced to 
one-sixth of the body size in the ancestral population within about 
six millennia*”. Flores may have been an exceptional case; however, 
the fossil evidence from Mata Menge highlights how quickly major 
evolutionary changes could have occurred in hominin populations 
cut off on isolated and impoverished islands of Wallacea. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Comparative samples. Comparative fossil samples include the proposed two 
major ancestral candidates for H. floresiensis, H. habilis sensu lato (East Africa) 
and early Javanese H. erectus (Java), as well as other Asian archaic Homo (Dmanisi 
Homo and Middle Pleistocene Asian Homo), and H. floresiensis from the Late 
Pleistocene of Liang Bua (Extended Data Table 3). No Australopithecus molar 
samples were included in the comparison with SOA-MM1 because the latter is 
obviously derived in having a well-developed mid-trigonid crest’® and a gently 
convex, non-bilobed, buccal crown outline®. However, in view of the previous 
claim that the H. floresiensis mandibles resemble Australopithecus afarensis 
(ref. 6, but see ref. 2), the mandibular analysis includes specimens belonging to this 
species as well as the recently reported ‘earliest Homo’ specimen from Ethiopia’. 
The deciduous tooth analyses also include Australopithecus specimens because no 
measurable mandibular deciduous canines are represented in the existing fossil 
collections of H. habilis to represent the primitive condition in this tooth. Two 
worn H. erectus specimens from the Zhoukoudian Lower Cave are included in 
the metrical comparison, but could not be included in the PCA; nor are there any 
Javanese H. erectus specimens of this tooth known. 

Debate continues over whether H. habilis sensu lato includes diverse evolving 
lineages*!*4, but we pooled the relevant specimens from East Africa for the present 
purpose to recognize primitive morphological condition in Homo. 

The early Javanese H. erectus sample is from the varying stratigraphic levels at 
Sangiran, Central Java, which are dated to between >1.2 and 0.8 Ma*>”®, Previous 
studies demonstrated significant temporal decrease in tooth crown size within this 
sample!!56, but we nevertheless pooled these chronological subgroups for the pur- 
poses of this study because their crown shapes are remarkably similar to each other’. 

The other Asian archaic Homo samples (such as Dmanisi Homo, and various 
groups of the Middle Pleistocene East Asian Homo, as listed in Extended Data 
Table 3) were included in the linear metric analyses of the mandible and teeth. 

Our H. sapiens sample is from Africa, Europe, Asia, and Oceania, with particu- 
lar emphasis on prehistoric individuals from Southeast Asia, including Flores, as 
well as modern small-bodied populations (such as Philippine ‘Negrito, Andaman, 
African ‘Pygmy; and ‘Bushmar) (Extended Data Table 4). This choice was made to 
reflect species-wide variation of H. sapiens, and to respond to the claim that Liang 
Bua H. floresiensis resembles a local short-statured Australomelanesian popula- 
tion*”. Sexes were pooled due to general difficulties in sex assignment for various 
fragmentary hominin fossils. 

Materials. The data of the mandibles were taken from the original specimens, 
but some notes should be made on the materials for the dental analyses. Dental 
specimens with severe tooth wear were excluded from the metric analyses. Metric 
and non-metric data were obtained from the original specimens, plaster casts, or 
published studies (Extended Data Table 3). For all the H. floresiensis, early Javanese 
H. erectus, and H. sapiens specimens, high-quality ‘isolated’ plaster casts were prepared 
by Y.K. with partial assistance from Hisao Baba. Silicone was used for molding and 
the produced plaster cast of a dentition was then cut with a saw to isolate individual 
teeth. Such isolated casts can be measured more easily and accurately than the original 
specimens when the teeth are embedded in the jaw bones and measurement equip- 
ment is difficult to apply to the original specimens. Thus, we used these isolated casts 
for linear measurement and crown contour extraction. Non-isolated, high-quality 
plaster casts were used for most of the H. habilis and H. ergaster specimens. These 
were prepared by Gen Suwa with dimensional accuracies being within + 0.1 mm**, 
Measurements. A digital sliding caliper (Mitutoyo) was used for linear measure- 
ments. Mesiodistal (MD) and buccolingual (BL) tooth crown diameters were 
recorded with allowance for wear, following the methods outlined in ref. 39. Values 
from the right and left sides are averaged for the fossil specimens, while the data for 
H. sapiens are from the better-preserved and/or less-worn side. All the metric data were 
taken by Y.K., with the exception of those cited from the literature (refs. 15, 39-42). 
CT scan. The Mate Menge hominin fossils reported here were CT scanned 
using the microfocal X-ray CT system TXS320-ACTIS (TESCO, Japan) at the 
National Museum of Nature and Science, Tokyo, in 2014. Original scans were 
taken at 189kV and 0.23 mA with a 0.5-mm-thick copper plate prefilter to lessen 
beam-hardening effects. Scanned images were reconstructed into a 512 x 512 
matrix of 150j1m pixel size with 150m slice interval and thickness (mandible), 
or 512 x 512 matrices of 321m pixels with 32 microns slice interval and 34.63 
microns slice thickness (isolated teeth). 

Size-adjusted PCAs of the mandibular corpus measurements. Principal component 
analysis (PCA) was performed using mandibular corpus heights and widths as varia- 
bles (Extended Data Fig. 5). The size adjustment was done by dividing each raw meas- 
urement by the geometric mean of all the measurements used for each individual. 
Elliptic Fourier analysis (EFA) of the mandibular molar. Occlusal crown con- 
tours of the mandibular molar were analysed by normalized (size-standardized) 
EFA‘ (Fig, 3), following the previous analysis of the teeth of H. floresiensis*. This 
method was chosen in that study because the H. floresiensis teeth are moderately 
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worn and retain few homologous landmarks. We performed EFAs for both M;s 
and Mgs, because the position of SOA-MM1 is indeterminate. 

The comparative samples were H. floresiensis, the two major ancestral candidates 
of H. floresiensis (H. habilis sensu lato and early Javanese H. erectus), and H. sapiens. 
Comparisons were made on the images from the right side teeth, or horizontally 
flipped images of the left teeth if the latter side is better preserved. The crown 
contour of each tooth was captured by photography with a dental cast placed so 
that its cervical line is vertical to the axis of the camera lens***”. Local fluctuations 
of the cervical lines were ignored**. A 100 mm macro lens was set to a Canon D40 
digital camera to minimize the parallax effect. Interproximal wear was corrected 
on each photograph before extracting the crown contour. Capturing of crown 
contours from the digital images, obtaining EFDs, and PCA of the normalized 
EFDs were conducted using the software SHAPE 1.3 (ref. 48). Other methodolog- 
ical details for the EFA are available in ref. 3. 

Size-adjusted PCAs of the mandibular deciduous canine (d,). Principal compo- 
nent analysis (PCA) was undertaken using five size-adjusted linear crown diameters 
(mesiodistal diameter, buccolingual diameter, mesial crown shoulder height, crown 
height, and distal crown shoulder height), as shown in Extended Data Fig. 6c-e. 
The size adjustment was done by dividing each raw measurement by the geomet- 
ric mean of the five measurements for each individual. Crown height of the less 
worn SOA-MM7 can be estimated with some confidence, but moderately worn 
SOA-MM8 was excluded from this analysis. 

Data reporting. No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded to allocation 
during experiments and outcome assessment. 
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Extended Data Figure 1 | CT-based images of the SOA-MM4 mandible 
and photos of the SOA-MM6 incisor. ai, SOA-MM4 mandible. Surface- 
rendered images of superior (a), lateral (b), inferior (c), lingual (d), 
anterior (e), and posterior (f) views. Sagittal (h) and horizontal (i). CT 
sections at the plane indicated by the green (g) and red (h) lines. aMF, a 
branch of the mandibular foramen; ARR, anterior ramus root; LP, lateral 
prominence; M,, M; alveolus; M2, M2 alveolus, M3, M3 alveolus; MC, 
mandibular canal; Mas, line for the masseter muscle attachment; MHL, 
mylohyoid line; pbMC, posterior branch of the mandibular canal; SLT, 
superior lateral torus. j-k, SOA~-MM6 mandibular incisor (I,/2) fragments. 
The crown (j, SOA-MM6a) and a root (k, SOA-MM6b) fragments were 
used for laser ablation uranium-series dating. The specimen was deposited 
before at least 0.55 Ma’. Note the bevelled occlusal wear surface (arrow). 
Scale bar, 5mm. 
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a Au. afarensis 
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S early Javanese H. erectus 

z MP East Asia (Zhoukoudian) 
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M1 mesiodistal (MD) diameter 
M2 mesiodistal (MD) diameter 
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M1 buccolingual (BL) diameter M2 buccolingual (BL) diameter 


Extended Data Figure 2 | Linear metric comparisons of the mandibles and permanent teeth. a-e, Scatter plots of the mandibular corporal dimensions 
(a, b) and permanent tooth crown diameters (c-e). We identify SOA-MM1 as M, (e), but there remains a slight possibility that this tooth is M, (f). Metric 
data of SOA-MMA: corpus height at M2, 18 mm; corpus height at M2/3, 18.5 mm; corpus width at M2, 12.5 mm; corpus width at M2/3, 13 mm. 
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Extended Data Figure 3 | Mandibular comparisons. a—m, H. habilis 
sensu lato: OH 13 (a, late adolescent), OH 37 (b, horizontally flipped 
image), KNM-ER 1802 (c, late adolescent), KNM-ER 3734 (d, horizontally 
flipped image), KNM-ER 60000 (e, horizontally flipped image) (photo by 
E Spoor, copyright National Museums of Kenya); early Javanese H. erectus, 
Sangiran 1b (f), Sangiran 9 (g), Sangiran 22 (h), Sb 8103 (i), Sangiran 21 (j); 
Liang Bua H. floresiensis: LB1 (k), LB6/1 (1, horizontally flipped image; 

the corpus is distorted); (m) SOA-MM4. Scale bar, 30 mm. Note that the 


H. habilis mandibles tend to exhibit a thicker corpus, the position of the 
basal ramus (filled arrow) that is shifted laterally relative to the corpus 
midline axis, a prominent posterior part of the alveolar prominence (filled 
triangle), and a wider extramolar sulcus between the anterior ramus root 
(open arrow) and the molar row. The early Javanese H. erectus sample is 
variable but includes specimens with weaker expressions in these traits. 
The Liang Bua H. floresiensis and the SOA-MM4 mandibles share such 
derived features with early Javanese H. erectus. 
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Extended Data Figure 4 | Comparisons of the hominin mandibles and teeth from So’a Basin (Mata Menge) and H. floresiensis from Liang Bua. 
a, SOA-MM4 mandible. b, c, Right lateral and left lateral (horizontally flipped) views of LB1. d, Right lateral view of LB6/1. e, f, The SOA-MM3 and LB1 
P3s, respectively. g, SOA-MM1 Mj. h-j, Occlusal views of SOA-MM4 (h), LB1 (i), and LB6/1 (j) mandibles. Scale bar, 10 mm. 
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a Au. afarensis 

H LD 350-1 

h_H. habilis 

S early Javanese H. erectus 

z MP East Asia (Zhoukoudian) 
Li H. floresiensis (LB1) 

Le H. floresiensis (LB6/1) 


Ss SOA-MM4 
“0.2 0.1 0 0.1 02 
PC1 
b 

Variable (size-standardized) PC1 PC2 PC3 PC4 
Corpus height at V2 0.936 0.313 -0.160 0.013 
Corpus height at M23 0.955 -0.222 0.198 0.016 
Corpus width at M2 -0.803 -0.528 -0.275 0.034 
Corpus width at Mas -0.934 0.325 0.145 0.022 
Proportion (%) 86 11 3 0 
Cumulative proportion (%) 86 97 100 100 


Extended Data Figure 5 | Principal component analyses of the four size-__ well-separated in PC2. SOA-MM4 belongs to the cluster of Homo in 
standardized mandibular measurements. a, Scatter plot of the PC scores. _ this PC. SOA-MM4 occupies the space in between the two Liang Bua 


b, Component loading of each PC. PC1 does not distinguish Homo from H. floresiensis mandibles, suggesting their shared lateral corporal shape. 
Au. afarensis, but Au. afarensis and post-habilis Homo are relatively 
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d, mesiodistal (MD) diameter 


4 5 6 
d, labiolingual (LL) diameter 


d, crown height (CH) index (CH/LL*100) 


g 


5 


8 


8 


d, crown area (squareroot of ‘MD*LL’) 


5 


crown size 
Variable (size-standardized) PC1 PC2 PC3 PC4 PC5 
Mesiodistal diameter 0.599 0.166 -0.554 0.553 0.031 
Labiolingual diameter 0.497 0.798 -0.084 -0.328 0.031 
Crown height 0.859 -0.232 0.456 0.027 0.015 
Mesial shoulder height -0.163 -0.788 -0.496 -0.320 0.054 
Distal shoulder height -0.893 0.135 0.404 0.141 0.038 
Proportion (%) 53 21 18 8 0 
Cumulative proportion (%) 53 74 92 100 100 


Extended Data Figure 6 | Metric analyses of mandibular deciduous 


canines. a—e, Comparisons of the crown length and breadth (a), and 


relative crown height (b). Results of the PCA based on size-adjusted 
five crown diameters (c, d) and the component loadings of each PC (e). 
‘Crown size’ = geometric mean of the five crown diameters used. 
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a Au. afarensis 

a Au. africanus 

z_ H. erectus (Zhoukoudian) 
S7 SOA-MM7 

Ss SOA-MM8& 

x H. sapiens 


Au. afarensis and H. sapiens are indistinguishable in crown size (d) but 
they are discriminated from each other by PC1 (P< 0.00002, t-test). 
SOA-MM7 occupies an intermediate position between Au. afarensis and 
H. sapiens, suggesting its moderately primitive crown configuration. The 
other PCs did not discriminate Au. afarensis and H. sapiens. 
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Extended Data Table 1 | 2014 Hominin fossil collection from Mata Menge 


Specimen No. 


Catalogue No. 


Date of discovery 


Portion 


SOA-MM1 


SOA-MM2 


SOA-MM3 


SOA-MM4 


SOA-MM5 


SOA-MM6 


SOA-MM7 


SOA-MM8 


MM14-T32D-F 191 
MM14-T32C-F234 
MM14-T32D-F384 
MM14-T32C-F277 
MM14-T32C-F452 
MM14-T32B-F94 
MM14-T32C-dry sieve 


MM14-T32B-dry sieve 


2014 Oct 08 


2014 Oct 14 


2014 Oct 14 


2014 Oct 14 


2014 Oct 16 


2014 Oct 18 


2014 Oct 21 


2014 Oct 24 


broken crown of left M1 (or Mz) 

complete crown and root of left |' 

hominin cranial fragment? 

right mandibular body 

complete crown and root of right P® 

broken root and crown fragments of right I1/2 
nearly complete left de 


nearly complete right de 


All specimens are housed in the Geology Museum in Bandung. 
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Extended Data Table 2 | Hominin teeth from Mata Menge as compared to those of Liang Bua H. floresiensis? 


Specimen Tooth Side Wear? Crown diam. Cervical diam.2 _—_— Root length’ 
MD* MD* BL° Height? 

as measured corrected MD BL 
SOA-MM2 r L 5 7.6 8.0 6.4 - 5.5 5.7 12.0 
LB15/2 " L 7 - - 26.2 = 52 6.2 12.0-13.0 
SOA-MM5 p3 R 3 6.6 6.8 9.4 - 4.4 (8.4) (12.5) 
LB1/1/ ps R&L 3.5 6.85 7.0 9.25 = 4.85 9.0 14.95 
SOA-MM6 hie R 5? - - - ol - - - 
SOA-MM1 M, (or M2) L 2 9.7 9.7 8.9 > 7 - - 
LB1! Mi R&L 4.75 9.25 9.6 10.5 ~ 8.2 9.1 12.4, 12.459 
LB6/1' Mi R&L 4.5 8.9 9.2 10.0 . - 8.8 = 
LB1! M2 R&L 3.75 9.8 10.1 10.2 = 8.8 8.9 12.5, 11.759 
LB6/1' M2 R&L 3.5 9.45 9.7 9.55 a 7 - 7 
SOA-MM7 d L 4.7 4.7 4.4 (5.8) 3.9 3.4 - 
SOA-MM8& d R 4.8 4.9 4.5 = 3.9 3.4 - 


Measurements of the Liang Bua hominins cited from ref. 16. 


Scored following ref. 49. 


°Measured following the method of ref. 39. 


‘Buccal crown height. 


©Cervical diameters as defined by ref. 50. 


‘Buccal root length. 


®Lengths for mesial and distal roots, respectively. 


‘Average of the right and left sides. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 3 | Comparative fossil samples 


Sample Age (Ma) Portion Composition Data source 
Mandible LB1*, 6/1# Originals 
H. floresiensis (Liang Bua) 0.1-0.06 
Teeth LB1*, 6/1* Ref. 16 
A.L. 198-22%, 225-8, 315-22, 330-5*, 417-1a", 436-1*, 437-2, 438-1*, 444- 
Au. afarensis 3.6-3.0 Mandible 2, 620-1, 188-1*, 198-1, 207-13, 266-1, 333w1a,b, 333w-32+60, Ref. 51 
MAL1/12#, MAK1/2#, LH4 
“Earliest Homo” 2.8 Mandible LD350-1* Ref. 9 
; Originals, 
Mandible KNM-ER 3734, 1805, 60000; OH 13%, 37# 
Ref.40 
H. habilis 2.3-1.6 A.L. 666-1; KNM-ER 1502*, 1507*, 1508*, 1590, 1801*, 1802*, 1805*, 
Teeth 1813; 2597*, 2601*, 60000; OH7*, 13*, 16*, 37*, 39; L7-279*, 628-10, Refs. 3,13, 40 
894-1; Omo 75-14*, 75s-15*, 195-1630* 
Dmanisi Homo 1.75 Teeth D211, 2600, 2700, 2735 Ref.15 
. 2 Originals, 
Mandible Sangiran 1b*, 5, 9*, 22 
Early Javanese rer Ref.52 
H. erectus (older) a Sangiran 1b*, 4, 5*, 6b*, 6b, 7-35, 7-42*, 7-43*, 7-58, 7-61*, 7-62*, 7-63%, 
Teeth Ref. 3 
7-64*, 7-65*, 7-76*, 7-78", 7-84*, 7-85, 7-86, 22*; Bk 7905* 
Early Javanese : 
1.0-0.8 Teeth Sangiran 7-20*, 7-21*, 7-22*, 7-27, 7-31, 7-32; Sb 8103*, Ng 8503* Ref. 3 
H. erectus (younger) 
; ; : Originals, casts, 
Mandible Lantian; Zhoukoudian G1*, H1*, K1#, PA86; Penghu 1 
Ref.53 
Middle Pleistocene East 
Asian Hi c. 0.75-0.05 Zhoukoudian 1/2, 4, 19, 34, 35, 36/37, 43, 44, 45, 97, 98, 99, 107, 108, 
sian Homo 
Teeth 137’; Lantian PA102; Hexian PA834, 835, 838, 839, AN1644; Chaoxian; Refs.3, 41, 42 


Xujiayao PA1480-1, 1480-3 


#Specimens included in the PCAs. 


*Specimens included in the EFAs. 


Refs. 51-53 are cited in this table. 
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Extended Data Table 4 | Comparative Homo sapiens dental sample 


Remarks N? Repository” 
Prehistoric Southeast Asia 
Finest Aimere, Gua Alo, Gua Nempong, Liang Bua, Liang Momer, 5 NBC, 
Liang Toge, Liang X ARKENAS 
Java* Hoekgrot, Wajak 3 NBC 
Malaysia* Guar Kepah 19 NBC 
Ginga Mai Da Dieu, Mai Da Nuoc, Hang Chim, Dong Cang, oe ie 
Con Co Ngua 
Australia/Melanesia 
New Guinea* 30 AMNH, MH 
Indigenous Australian/Tasmanian* 19 AMNH 
Southeast Asia 
Philippine Negrito* 20 MH 
Sine Andaman, Indonesia, Malaysia, Nicobar, Philippine, 57 AMNH, MH 
Singapore, Thailand 
Northeast Asia 
Northeast Asia China, Chukuci, Korea, Mongol, Yukagir 18 AMNH 
Africa 
Bushman 17. AMNH, MH 
African Pygmy 20 MH 
South Africa Excluding Bushman 26 AMNH 
East Africa 45 AMNH 
West Africa Excluding Pygmy 55 AMNH 
Indo/Europe 
India 6 AMNH 
German 65 AMNH 
Others Hungary, Poland, Sweden 8 AMNH 
Total 490 


*Number of individuals. 
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©NBC, Naturalis Biodiversity Center, Leiden; ARKENAS, National Research and Development Centre for Archaeology, Jakarta; AMNH, American Museum of Natural History, New York; MH, Musée de 


l'Homme, Paris, IAH, Institute of Archaeology, Hanoi. 
*Samples included in the EFAs. 
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Age and context of the oldest known hominin fossils 


from Flores 
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Erick Setiyabudi°, Rainer Griin!®, Mark W. Moore’, Dida Yurnaldi*’, Mika R. Puspaningrum®, Unggul P. Wibowo**, 
Halmi Insani°, Indra Sutisna°, John A. Westgate!®, Nick J. G. Pearce!!, Mathieu Duval'*, Hanneke J. M. Meijer!’, Fachroel Aziz°, 
Thomas Sutikna*“, Sander van der Kaars!-!®, Stephanie Flude!” & Michael J. Morwood*t 


Recent excavations at the early Middle Pleistocene site of Mata 
Menge in the So’a Basin of central Flores, Indonesia, have yielded 
hominin fossils’ attributed to a population ancestral to Late 
Pleistocene Homo floresiensis’. Here we describe the age and 
context of the Mata Menge hominin specimens and associated 
archaeological findings. The fluvial sandstone layer from which the 
in situ fossils were excavated in 2014 was deposited in a small valley 
stream around 700 thousand years ago, as indicated by “°Ar/*°Ar and 
fission track dates on stratigraphically bracketing volcanic ash and 
pyroclastic density current deposits, in combination with coupled 
uranium-series and electron spin resonance dating of fossil teeth. 
Palaeoenvironmental data indicate a relatively dry climate in the 
So’a Basin during the early Middle Pleistocene, while various lines 
of evidence suggest the hominins inhabited a savannah-like open 
grassland habitat with a wetland component. The hominin fossils 
occur alongside the remains of an insular fauna and a simple stone 
technology that is markedly similar to that associated with Late 
Pleistocene H. floresiensis. 

Mata Menge is located near the northwestern margin of the So’a 
Basin, a ~400 km? geological depression in central Flores (Fig. 1). The 
basement substrate consists of the Ola Kile Formation (OKF), a greater 
than 100-m-thick sequence of indurated volcaniclastic deposits**. 
Zircon fission-track (ZFT) age determinations date the upper OKF to 
1.86 + 0.12 million years ago (Ma) (ref. 4). The ~5° southward dipping 
volcanic breccias of the OKF are associated with a former volcanic 
centre, the Welas Caldera, on the northwestern margin of the basin 
(Fig. 1). The OKF is unconformably overlaid by the Ola Bula Formation 
(OBF)**. A focus of palaeoanthropological research since the 1950s*"“, 
the OBF is up to 120 m thick and comprises an intra-basinal fossil- 
and stone artefact-bearing sequence composed largely of undistorted 
volcanic, fluvial, and lacustrine sediments deposited between 1.8 and 
0.5 Ma*4 (Supplementary Information Table 1). An extensive lacustrine 
sequence—the ‘Gero Limestone Member’ (GLM)—caps the basin infill 
and registers the formation of a basin-wide freshwater lake. 

The total preserved thickness of the OBF at Mata Menge is 40m 
(Fig. 1). The uppermost interval of the GLM, with a thickness of 9m, 
crops out on a hill 600 m west (excavation number 35, or E-35). The 
two main fossil-bearing intervals at Mata Menge form part of a roughly 
NNW-SSE trending palaeovalley-fill sequence dominantly occupied by 


cut-and-fill fluviatile and clay-rich, water-supported mass flow deposits 
(mudflows). The studied upper fossiliferous interval, which contains 
the hominin fossils, is exposed at the head of a modern stream valley 
at the base of a hill (height = 397 m). This less than 30-cm-thick OBF 
sandstone, named Layer II, is well-consolidated, fine- to medium- 
grained, and contains locally faint parallel laminations in the lower 
part, as well as numerous water-worn volcanic pebbles (<60 mm). 
Layer II is discontinuous towards the west and east, and it has an irreg- 
ular lower bedding plane that cuts down into the underlying unit, a 
well-developed, consolidated palaeosol (Layer III). A ~6.5-m-thick 
sequence of mudflow deposits (Layers I-a to I-f) overlies Layer II and 
is separated from it by a generally sharp contact surface. Layer II rep- 
resents the deposit of a small, sinuous stream tributary with a NNW 
to SSE flow direction, as deduced from the slight decrease in elevation 
of the top of Layer II in the same direction (20 cm over a horizontal 
distance of 17m). 

We conducted a 50 m? excavation (E-32) into Layer II in 2013 (Fig. 1, 
Extended Data Figs 1 and 2), yielding fossils of Stegodon florensis”, 
giant rat (Hooijeromys nusatenggara, first described in ref. 15), Komodo 
dragon (Varanus komodoensis), and crocodile, as well as stone artefacts 
(Fig. 2). In 2014, we exposed Layer II over a larger area, recovering 
seven hominin fossils (six teeth and a mandible fragment), and an 
undiagnostic hominin cranial fragment'. The hominin fossils were 
embedded in the sandstone matrix of Layer II near the stratigraphic 
interface with the overlying mudflow deposit (Extended Data Fig. 2). 

Layers I-a to I-f are clearly related to eruptive activity within the 
Welas Caldera, then occupied by a lake. Four articulated thoracic ver- 
tebrae of S. florensis were recovered from Layer II (Fig. 2k). These are 
the only articulated stegodont elements so far recovered at Mata Menge, 
indicating relatively limited post-mortem modification before burial 
by mudflows. We hypothesize that the artefacts and faunal remains, 
including hominin elements, were transported short distances by the 
stream that deposited Layer II, before mudflows originating from 
within the Welas Caldera inundated these valleys with metre-thick 
muddy debris. The presence of elements from multiple hominin indi- 
viduals could be the result of the same volcanic event that triggered the 
mudflows. Presently, however, it is not possible to estimate the time 
interval separating the deaths of the hominins from the deposition of 
the mudflows. 
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Figure 1 | See next page for caption. 
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Ola Bula Fm. 


Ola Kila Fm. 


Figure 1 | Context and chronology of the hominin fossils at Mata 
Menge. a, b, location of Flores and the Soa Basin. c, Digital elevation map 
of the So’a Basin, with location of Mata Menge and other sites mentioned 
in the text. A single outlet of the main river system (the Ae Sésa) drains 
the basin via a steep-walled valley towards the northeast. d, Stratigraphy 
and chronology of the main fossil-bearing intervals and intervening Ola 
Bula Formation (OBF) deposits at Mata Menge. Several basin-wide key 
marker tephra beds that are exposed in the hill flank on the northern 

side of Mata Menge (trench E-34/34B) are eroded in the central part of 
the stream valley, where they are replaced by a 4—-5-m-thick sequence of 
tuffaceous mudflows with intervening fluvial lenses forming the lower 
fossil-bearing palaeovalley-fill sequence. e, f, Context of the hominin 
fossils; f is a 3D image of Mata Menge and surrounds, with excavated 
trenches outlined in red and labelled, and e is a 3D representation of 

the stratigraphy exposed by trench E-32A to E, with coloured ovals 
denoting the positions of in situ hominin fossils (SOA-MM1, 2 and 

4-6) excavated from the fluvial sandstone unit, Layer II. The remaining 
hominin specimens were retrieved in the sieves. Trenches E-1 to E-8 were 


A total of four new radioisotopic age determinations, with ages in 
sequential order and in accordance with the stratigraphic sequence, 
provide a chronology for the hominin fossils (Fig. 1; Supplementary 
Information). Near the base of the OBF at Mata Menge, a widespread 
ignimbritic marker bed (the Wolo Sege Ignimbrite; T- WSI) with an 
Ar/*Ar age of 1.01 £0.02 Ma (ref. 14; Fig. 1) is recognized on the 
combined basis of its stratigraphic association, unique depositional 
architecture, and glass-shard major element chemistry (Extended 
Data Fig. 3). In addition, the hominin find-locality in E-32 is 
situated 12.5 m stratigraphically above a previously reported ZFT 
date of 0.80 + 0.07 Ma from Mata Menge’. To verify this prior 
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excavated between 2004 and 2006, at the section originally excavated by 
Th. Verhoeven in the 1950s°°. The remaining trenches were excavated 
between 2010 and 2015. E-12 is a slot-trench excavated into the side of 

a hill, revealing an 18-m-thick sequence of lacustrine clays and micritic 
limestones, fluvial sandstone beds, massive tuffaceous mudflows, 
well-developed clay-textured palaeosols, and numerous centimetre-thick 
basaltic tephra inter-beds from the middle-upper part of the OBE. At the 
base of this slot-trench, a less than 30-cm-thick fossil-bearing sandstone 
unit (Layer I1)—from which all the hominin fossils were retrieved—was 
exposed. Tephra codes in d are as follows (top to bottom): T6 (upper 
inter-regional tephra); PGT-2 (Piga Tephra 2); T- UMM (Upper Mata 
Menge Tephra); T-LMM (Lower Mata Menge Tephra); T-Pu (Pu Maso 
Tephra); T3 (lower inter-regional tephra); T-T (Turakeo Tephra); T-WSI 
(Wolo Sege Ignimbrite); and T-W (Wolowawu Tephra). The original 
published *°Ar/*’Ar age for T-WSI is 1.02 + 0.02 Ma (ref. 14); however, 
when recalculated to the recently determined value for the age standard 
ACS-2 used in this study (1.185 Ma; see reference 25 in the Supplementary 
Information), T-WSI becomes 1.01 + 0.02 Ma. 


estimate, we conducted isothermal plateau fission-track (ITPFT) 
dating of glass shards from an inter-regional tephra marker (T3) 
identified at several So’a Basin localities, including just above the 
T-WSI at Mata Menge (in E-34/34B), returning a weighted mean age 
of 0.90 + 0.07 Ma (based on two independent age determinations) 
(Supplementary Information Table 2). #°Ar/°’Ar single crystal dating 
of hornblende from the Pu Maso Ignimbrite (T-Pu) located just 
above T3 in E-34/34B yielded a weighted mean age of 0.81 + 0.04 Ma 
(Extended Data Fig. 4), which is stratigraphically consistent with that 
of underlying T3. These ages demonstrate that Layer II was deposited 
after ~0.80 Ma. 


Figure 2 | Stone artefacts and fossils from Mata Menge. All specimens 
are from the hominin fossil find-locality (Layer II fluviatile sandstone, 
trench E-32). a, Bifacial core (chlorite). b, c, Chert flakes. d, Chalcedony 
flake. e, Rhyolite flake. f, Right maxilla fragment (M1-M3), Hooijeromys 
nusatenggara. g, Left mandible fragment (m1-m3, i) H. nusatenggara. 


h, Right maxilla fragment, Varanus komodoensis. i, Crocodile tooth. 

j, Right coracoid of a duck (cf. Tadorna). k, Stegodon florensis thoracic 
vertebrae in articulation (still partially embedded in sandstone matrix). 
Scale bars, 10 mm (a-j); 100 mm (k). 
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To further constrain the age of the hominin fossils, we carried out 
“°Ar/*Ar dating on one basaltic tephra and one rhyolitic tephra from 
the GLM above Layer II (E-12 and E-35). The GLM contains at least 
85 crystal-rich tephra inter-beds of basaltic composition, collectively 
named the Piga Tephra (the lower 56 tephras are sequentially numbered 
PGT-1 to PGT-56). At Mata Menge, PGT-2 occurs 13.5 m above Layer 
II, and produced a *°Ar/*?Ar weighted mean age of 0.65 +0.02 Ma 
from single crystal dating of hornblende (Extended Data Fig. 5). 
This is in accordance with the published ZFT age of a basaltic tephra 
inter-bed from the lower part of the GLM (0.65 + 0.06 Ma)‘. Finally, 
a biotite-bearing vitric-rich ash of distinctive rhyolitic composition 
(T6; Extended Data Fig. 3) from the top of the GLM has an *°Ar/*?Ar 
age of 0.51 +0.03 Ma, based on the weighted mean of single grain feld- 
spar analyses (Extended Data Fig. 6). Thus, the hominin fossils con- 
strained by the lowermost of these two radioisotopic dates within the 
GLM have a minimum age of ~0.65 Ma. 

We also conducted uranium series (U-series) dating of a hominin 
tooth root fragment (SOA-MM6) from Layer II, and combined U-series 
and electron spin resonance (ESR) dating of two S. florensis molars 
excavated in situ from the same sedimentary context (Extended Data 
Fig. 7; Supplementary Information). U-series dating of the hominin 
tooth root independently confirms that this specimen has an age of 
at least 0.55 Ma, whereas combined U-series/ESR dating indicates 
minimum and maximum ages of around 0.36 Ma and 0.69 Ma, respec- 
tively, for the Stegodon molars. In sum, therefore, we have used multiple 
dating methods to establish an age of ~0.70 Ma for the hominin fossils. 

Our systematic, high volume excavations (~560 m7) at Mata 
Menge between 2010 and 2015 yielded many fossil vertebrate remains 
(Supplementary Information). To date, 75% of the >7,000 vertebrate 
fossils recovered from E-32 have been analysed, and include S. florensis 
(23.7% of the number of identified specimens (NISP)), V. komodoensis 
(0.6% of NISP), freshwater crocodiles (3.7% of NISP), frogs (0.3% of 
NISP), murine rodents (15.6% of NISP), and birds (0.5% of NISP), the 
remainder comprising unidentifiable bone fragments. From the lower 
fossil-bearing interval (E-1 to 8 and E-11 to 31D), the remains of least 
120 S. florensis individuals are represented by dental elements span- 
ning all ontogenetic stages’®. The age profile of this death-assemblage 
corresponds to that of a living population. The lack of age-selective 
mortality fits a mass-death event, unlike the juvenile-dominated pat- 
tern encountered in the Stegodon death-assemblage of the H. floresiensis 
type-locality, Liang Bua’”. In Layer II, remains of juvenile, sub-adult, 
intermediate-aged, and old Stegodon individuals are also present, but 
the minimum number of individuals (MNI = 15) is too low to allow 
construction of a reliable age profile. 

We conducted carbon and oxygen isotope analysis of tooth enamel 
samples collected from several S. florensis and murine rodent individ- 
uals from the two fossil-bearing levels at Mata Menge (Extended Data 
Fig. 8). The results indicate a diet dominated by Cy grasses, suggesting 
both animals were grazers, and implying that open grasslands were 
the major vegetation type in the So’a Basin. The recovery of rare fossils 
of rails, swans, ducks, eagles, and eagle owls from the lower trenches 
(~0.80 to 0.88 Ma) provides further evidence for the presence of 
a savannah-like biome with wetland habitats, as well as scattered 
patches of forest!®. Fossil pollen and phytoliths from both fossil levels 
offer additional indications that grasses dominated the early Middle 
Pleistocene vegetation (Supplementary Information Table 9). Abundant 
moulds and casts of two freshwater gastropod species (Cerithidea) were 
recovered from Layer II, pointing to the existence of permanent fresh- 
water bodies in the ancestral stream valley. 

Our excavations uncovered 149 in situ stone artefacts in E-32, includ- 
ing 47 artefacts from Layer II, in direct association with the hominin 
remains (Fig. 2; Extended Data Fig. 9). Some of the artefacts from E-32 
are lightly to heavily abraded from low-energy water transport’, but 
74.5% are in fresh, as-struck condition, suggesting minimal disloca- 
tion from nearby stone-flaking areas. Overall, the E-32 assemblage 
reflects a technologically straightforward core-and-flake approach to 
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stoneworking”. As yet, no butchery marks have been identified on the 
faunal remains at Mata Menge. 

Notably, the tools and flaking technology in E-32 are nearly iden- 
tical in size and nature, respectively, to the assemblage dating some 
110 thousand years (kyr) earlier at Mata Menge'**!~3, including 1,186 
analysed stone artefacts from E-23 and E-27 excavated between 2011 
and 2014 (Supplementary Information Table 6). The E-32 assemblage 
is also technologically similar to the artefacts from Liang Bua, dating 
~600 kyr later’? and associated with H. floresiensis”>*°. The long 
persistence of this technology’? suggests stability in the behaviour of 
H. floresiensis'. In contrast, the only lithic assemblage recovered in situ 
below the T-WSI—which has a minimum age of 1.01 +£0.02 Ma and is 
the oldest known technology from Flores'*—while similar, features a 
typologically distinct element: large Acheulean pick-like implements” 
associated elsewhere with cognitively advanced tool-making”®’®*°, It 
is unclear why these artefacts are absent from the later technology of 
Flores. A shift to more arid conditions could have stimulated a series of 
technological changes. Alternatively, the earliest inhabitants of Flores 
may have responded to the limited resources of the island by reduc- 
ing the complexity of their tool-making repertoire to the minimum 
required for survival. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


For isothermal plateau fission track (ITPFT) dating of hydrated glass shards*!, 
the population-subtraction technique was applied***’. This technique is grain 
specific, in that every grain is examined separately under the microscope and a 
mean age is derived from a large number of shards. In this case, contamination can 
be readily monitored, and, if necessary, checked by electron microprobe analysis 
(EMA) of individual shards. Chemical homogeneity indicates derivation from a 
single eruptive event with a strong likelihood of uniform U content. Chunky, low 
vesicular glass shards in the size range of 500-250 1m were separated in order 
to maximise the surface area of the glass in the polished section and to optimise 
fission-track counting. ITPFT ages for T3 at two Soa Basin localities (Kopowatu, 
UT2382 and Lowo Mali, UT2383) are given in Supplementary Information Table 2, 
as well as operating conditions and the ages determined on the Huckleberry Ridge 
tephra internal standard. The single-crystal (sanidine) laser-fusion “Ar/*?Ar age of 
Huckleberry Ridge tephra is 2.003 + 0.014 Ma (2c error)™, and is indistinguishable 
from the PTF-corrected age of 2.08 + 0.21 Ma determined on UT1366 using the 
diameter-corrected procedure (DCFT)*° (see Supplementary Information Table 2). 
All samples were irradiated at the same time in a single can. Ages were calculated 
using the zeta approach and \p= 1.551 x 10°!” per year. Zeta value is 301 +3 
based on 6 irradiations, using the NIST SRM 612 glass dosimeter and the Moldavite 
tektite glass (Lhenice locality) with an “°Ar/**Ar plateau age of 14.34 + 0.08 Ma 
(refs 36,37). Ages are those corrected for partial track fading (PTF), achieved by 
the isothermal plateau method (ITPFT)*! and the diameter-corrected procedure 
(DCFT)**. Following irradiation, both T3 samples were subjected to a single 
heat treatment of 150°C for 30 days. After heating, the spontaneous and induced 
sample slides were simultaneously etched in 24% HF for 110s. The Ds and Di 
track diameters were then measured and the Ds/Di ratio determined. Provided 
that the Ds/Di ratio was close to unity and that the samples were adequately 
etched — as evidenced by the average track diameters being within the range of 
6-8 1m (ref. 31) — the samples were corrected for partial track fading (PTF) and 
the track densities determined. Area was then estimated using the point-counting 
method. The corresponding ps/pi ratio is equivalent to the true track density ratio. 
The age calculated from this ratio is therefore equivalent to the true age of the 
sample. The precision of ITPFT ages for some samples was improved by inde- 
pendent determinations made by different operators, in some cases using slightly 
different etching conditions. The weighted mean age and error of corrected ages 
for UT2382 and UT2383 is 0.90 + 0.07 Ma (see Supplementary Table 2). 

Glass shard major element determinations. Glass shard major element deter- 
minations were conducted on all rhyolitic pyroclastic density current (PDC) and 
airfall deposits at Mata Menge, as well as potential correlatives from other Soa 
Basin sites. Glass shard major element data was acquired with a JEOL Superprobe 
(JXA-8230), using the ZAF correction method. Analyses were performed with 
15 kV accelerating voltage, 8 nA beam current, and an electron beam defocused to 
between 20 to 10j1m. Standardization was achieved by means of mineral and glass 
standards. A rhyolitic glass standard (ATHO-G) was routinely used to monitor 
calibration in all analytical runs, and used to evaluate any day-to-day differences 
in the calibration. The large number of samples precluded conducting all analyses 
in a single batch. All analyses are normalized to 100 wt. % anhydrous, with H.O 
by difference being given, and total Fe is reported as FeO. Glass shard major ele- 
ment analyses are presented in Supplementary Information Table 3. Trace element 
analyses were conducted on individual glass shards from two distinct rhyolitic 
tephra marker beds of presumed distal source (T3 correlatives from Mata Menge, 
Lowo Mali and Kopowatu, and T6 from Mata Menge, respectively). This trace 
element data was then directly compared with reference data from potential distal 
tephra correlatives (that is, Youngest Toba Tuff (YTT), Middle Toba Tuff (MTT), 
Oldest Toba Tuff (OTT) and Unit E from ODP-758) acquired on the same instru- 
ment using the same standards and under the same analytical conditions**? (see 
Extended Data Fig. 3i-m). Trace element analyses were performed by laser ablation 
(LA) ICP-MS, using a Coherent GeoLas ArF 193 nm Excimer LA system coupled 
to a Thermo Finnegan Element 2 sector field ICP-MS. Trace element data were 
collected for individual shards, with the majority of analyses performed using 
20 jum ablation craters. Laser fluence was 10Jcm~’ at a repetition rate of 5 Hz for 
a 24 s acquisition. The minor ”°Si isotope was used as the internal standard, with 
SiO, (determined by EMPA) used to calibrate each analysis, after normalization 
to an anhydrous basis. The NIST 612 reference glass was used for calibration, 
taking concentrations from established sources”. A fractionation factor was 
applied to the data to account for analytical bias related to the different matrices of 
the reference standard and the sample. Explication of this factor as well as ICP-MS 
and laser operating conditions is given elsewhere*!. The MPI-DING reference 
glass ATHO-G (ref. 42) was analysed as an unknown under the same operating 
conditions at the same time. Analytical precision is typically between + 5-10%, and 
accuracy is typically around + 5%, when compared with the published GeoReM 


concentrations for ATHO-G. Glass shard trace element analyses are presented in 
Supplementary Information Table 4. 

Single crystal laser fusion *°Ar/*’Ar dating. We conducted single crystal laser 
fusion “Ar/*°Ar dating of the Mata Menge volcanic units. Hornblende crystals 
(<2mm in length) from the basaltic PGT-2 tephra sample (T XII 252-261) were 
pre-concentrated along with other ferromagnesian minerals using standard heavy 
liquid techniques and then distinguished from pyroxene using a Bruker micro- 
xrf, followed by handpicking of individual grains under a binocular microscope. 
For the other samples, hornblende (1-2 mm in length from FLO15-15) and small 
feldspar crystals (<0.5 mm in length from FLO15-09/2) were handpicked under a 
binocular microscope from the sieved and washed <2 mm size fraction. Crystals 
were loaded into wells in 18 mm-diameter aluminium sample discs for neutron 
irradiation, along with the Alder Creek sanidine age standard (ACs-2)*? as the 
neutron fluence monitor. In this study, we report our age determinations relative 
to the recently published and astronomically calibrated 1.185 Ma value for ACs-2 
(ref. 44). Neutron irradiation was performed in two batches (QL-OSU-39 and 
QL-OSU-42), each with a duration of 15 min, in the cadmium-shielded CLICIT 
facility at the Oregon State University TRIGA reactor. Argon isotopic analyses of 
gas released during the CO, laser single crystal fusion experiments (Supplementary 
Information Table 5) were made on a fully automated Nu Instruments Noblesse 
multi-collector noble-gas mass spectrometer, using procedures documented pre- 
viously'*°, Reconnaissance isotopic measurements of small gas aliquots released 
from feldspar crystals during an initial low-temperature heating step, using a low- 
power defocused laser beam, allowed the identification of scarce, K-rich grains 
within a population dominated by Ca-rich, K-poor plagioclase; the latter not being 
amenable to *°Ar/*’Ar dating because of small signal size. Fusion experiments 
on eight of these relatively K-rich grains identified by this prospecting method 
yielded small but measurable amounts of “°Ar (7.5 107!” to 1.3x 107) mol Ar) 
and *’Ar. Sample gas cleanup was through an all-metal extraction line, equipped 
with a -130°C cold trap (to remove H2O) and two water-cooled SAES GP-50 getter 
pumps (to absorb reactive gases). Argon isotopic analyses of unknowns, blanks, 
and monitor minerals were carried out in identical fashion during a fixed period 
of 400 s in 14 data acquisition cycles. ““Ar and *’Ar were measured on the high- 
mass ion counter, 38Ay and ?7Ar on the axial ion counter and *°Ar on the low-mass 
ion counter, with baselines measured every third cycle. Measurement of the *°Ar, 
38Ar, and *°Ar ion beams was carried out simultaneously, followed by sequential 
measurement of *’Ar and *”Ar. Beam switching was achieved by varying the field 
of the mass spectrometer magnet and with minor adjustment of the quad lenses. 
Data acquisition and reduction was performed using the program “Mass Spec’ 
(A. Deino, Berkeley Geochronology Center). J-values for unknowns were cal- 
culated by using a plane-fitting algorithm in the Mass Spec software applied to 
ACs-2 standard data from symmetrically distributed sample wells across the alu- 
minium irradiation disk (Supplementary Information Table 5). Detector inter- 
calibration and mass fractionation corrections were made using the weighted 
mean of a time series of measured atmospheric argon aliquots delivered from 
a calibrated air pipette'**>. Sample sets were bracketed by both air pipette and 
ACs-2 to monitor for possible instrumental drift. Blanks were measured at least 
once before each run of an unknown and had typical values of <1.5 x 10716 mol 
4Arand <1 x 10-8 mol **Ar for QL-OSU-39 experiments, and 2-3 x 107! mol 
Ar and <2.5 x 10~'§ mol **Ar for those of QL-OSU-42. The higher blanks of 
the QL-OSU-42 experiments can be attributed to the venting and baking of the 
Noblesse mass-spectrometer after the completion of QL-OSU-39 analyses. Decay 
and other constants, including correction factors for interference isotopes pro- 
duced by nucleogenic reactions, are given in Supplementary Information Table 5. 
Laser ablation U-series analyses. Laser ablation U-series analyses were carried 
out on two fragments of the same hominin incisor ((SOA-MM6) sample number 
3543A: a fragment of the tooth crown, including portions of dentine and enamel 
tissues; and sample number 3543B, a cross-section of the root of SOA-MM6 
only), as well as on two Stegodon florensis molars (sample number 3541, and a 
molar broken into two fragments, which we subdivided respectively into sample 
number 3542, and sample number 3544). The experimental setup, measurement 
conditions, and data evaluation followed principles and procedures described in 
ref. 46. The dentine and enamel tissues of sample number 3543A were analysed 
on different tracks, whereas three tracks were analysed across the root section 
(sample number 3543B). The Stegodon molars were analysed along several tracks 
that cut across the dental tissues. No individual age calculation was carried when 
the U-concentrations were below about 0.5 ppm and detrital 7’Th was observed 
(elemental U/Th ratios below 300). The analytical data of the enamel and dentine 
sections were integrated to provide the data input for the ESR age calculations. 
All results are shown in Supplementary Information Table 6. ESR dating was also 
performed on the two S. florensis molars noted above. The fossil teeth were pre- 
pared following a standard ESR dating procedure for enamel powder‘’. The grey 
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enamel layer was mechanically separated from the other dental tissues, and both 
inner and outer surfaces were removed with a dental drill to eliminate the volume 
that had received an external alpha dose. The samples were then ground and sieved 
to recover the size fraction of <200 1m. Dose evaluation used the multiple aliquot 
additive dose method. The powder was split into several aliquots and irradiated up 
to 4019 Gy with a Gammacell 1000 Cs-137 gamma source. ESR measurements were 
carried out with a Bruker Elexsys 500 spectrometer, using the following acquisition 
parameters: 3-5 scans, 2mW microwave power, 1024 points resolution, 12 mT 
sweep width, 50 kHz modulation frequency, 0.1 mT modulation amplitude, 20 ms 
conversion time and 5 ms time constant. The ESR intensities were extracted from 
T1-B2 peak-to-peak amplitudes of the ESR signal*®, and then normalized on the 
number of scans and mass. All aliquots of a given sample were measured within a 
short time interval (<1h). This procedure was repeated twice over two successive 
days without removing the enamel from the ESR tubes between measurements in 
order to evaluate measurement precision and thus Dg reproducibility: the latter 
was found to be excellent, with a variability of <3% between the two repeated 
measurements. Fitting procedures were carried out with the Microcal OriginPro 
9.1 software using a Levenberg-Marquardt algorithm by chi-square minimisation. 
Data were weighted by the inverse of the squared ESR intensity (1/I’) (ref. 49). Final 
equivalent dose (Dg) values were obtained by fitting a single saturating exponential 
(SSE) through the pooled ESR intensities obtained from the two repeated meas- 
urements. Given the magnitude of the Dg values (between 400 Gy and 600 Gy), 
a maximum irradiation dose (Dax) of 4019 Gy was used, so that Diax/Dz ratio 
remains between 5 and 10 as recommended in a recent study” to ensure relia- 
ble fitting. The final dose response curves (DRCs) are shown in Extended Data 
Fig. 7£, g. For the dose rate calculations, the following parameters were used: an 
alpha efficiency of 0.13 + 0.02 (ref. 51), Monte-Carlo beta attenuation factors from 
ref. 52, dose-rate conversion factors from ref. 53, an estimated water content of 
5 +3 wt.% in dentine and 20 + 5 wt.% in sediment. U and Th and K concen- 
trations in raw sediment were determined by ICP-OES and ICP-MS analysis on 
samples collected within Layer II (Supplementary Information Table 7). The mean 
radioelement concentration values were used for the age calculations. Combined 
U-series/ESR ages were calculated with DATA, a DOS-based programme™ using 
the US model defined in ref. 55, and considering the following geometry: sediment/ 
brown enamel/grey enamel/dentine. Further details about this dating method as 
applied to fossil teeth may be found elsewhere”. The results of the age calculations 
are shown in Supplementary Information Table 8. 
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Extended Data Figure 1 | Hominin fossil find-locality at Mata Menge. The sandy fossil-bearing Layer II infills depressions formed on this 


a, View of Excavation 32 (trench E-32) in 2014, taken towards the north- bedding surface. A sequence of mudflows (Layer I/a-f) rapidly covered 
north-west. The dip slope visible in the background is the eastern flank the entire river bedding and its exposed banks. d, Mould of a freshwater 
of the Welas Caldera, which was the source for many of the volcanic gastropod (Cerithidea) from sandy Layer II. e, Detail of the locally 
products deposited in the So’a Basin. b, trench E-32A to E viewed towards _—_ developed, gradual boundary between sandy Layers II and muddy Layer I. 
the southwest, in October 2015. c, E-32D to E-32E viewed towards the Note the abundance of muddy rip clasts around the transition. At other 
southwest. The irregular erosional upper surface of the reddish brown places, the boundary is sharp. f, West baulk of E-32C. Large Stegodon 
palaeosol (Layer III) formed the hardened bedding of a small stream. florensis bones occur at the boundary between Layers II and I. 
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Extended Data Figure 2 | Plan and baulk profiles of Excavation 32A-F 
showing distribution of finds. The horizontal plan (lower left corner) 
shows the horizontal coordinates of individual fossil finds (green crosses) 
and stone artefacts (blue diamonds). The position of hominin fossils is 
indicated with red stars. In the trench baulk profiles (top and right) only 
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the projected positions of fossil finds occurring within one meter of the 
baulks are plotted. All hominin fossils were recovered from the top of 
sandy Layer II. The basal part of the mudflow unit (Layers Ia-e) also 
contains fossils, stone artefacts, gastropods, and pebbles. The thick brown 
dotted line indicates the western margin of the ancient stream bed. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


1.2 
Hat d 
10h 
fog 


Upper inter-regional marker (T6) 


cao 0.81 ee @ Mata Menge (n=29) 


e e&, 
(wt %6) 0:7 se eee ° 
0.6} 4 
0.5 =I 


7 2.0 40 5 
. 1.9 | 
5 , oo q 
T-Pu (Pu Maso tephra) 4 
4 Mata Menge (n=22) q 
nC ‘wt %) - ae (ee) Nb 20 4 
(wt %) 3 i 7 o Pu Maso (n=17) 100 | 
1 © Lowo Mali (n=17) J 
2 90 4 
7 1.6) 80 J 
70 | 
0 1.5! 60 
i 2 E E 4 1.6 Th 50 10 20 30 40 - 60 70 80 90 100 
28 ey 1.2 40 
30 1 
4 
2of © } vs 20 Ag 4 
oni 1 . T3 (lower inter- 10 1 
15 A 4 0.9 regional tephra) 0 4 
CaO °° { CaO 08 © Mata Menge (n=25) 4 
(wt %) 1.0 fea 4° | (wt %) 97 @ Kopowatu (n=18) 400 4 
meg, { eae © Lowo Mali (n=18) — J 
° Ei : = 4 
05 a 
q 0.5 $00, ] 
OoEta i pa 0.4 ee ee ee 
0.5 1.0 1.5 i 0. 1.4 Zr 200 10 20 30 40 50 60 70 80 90 100 
FeO (wt %) Y 
2 2.2 
6 100 m 
T-T (Turakeo tephra) 0 
6 A upper bed, 
Mata Menge (n=25) 0 + 
K,0 4 4 lower bed, 0 
t % 3 Mata Menge (n=19) 
(wt %) Vv Turakeo Type Section (n=37) 40 
2 & Wolo Sege (n=34) 30 
v_ Tangi Talo (n=50 20 
1 @ Pu Maso (n=18) 10 
0 i od 
70 71 72 73 74 75 76 77 78 79 80 0.8 09 1.0 114 12 13 14 15 16 % 10 20 30 40 50 60 70 80 90 100 
SiO, (wt %) \ ¥ 
Key, atoc Key, itom 


Rhyolitic tephra, Mata Menge 

© Upper inter-regional marker (T6) 
(15-04-3, n=29) 

@ F29, Pu Maso tephra (T-Pu) 
(15-03-6, n=22) 

© F28, lower inter-regional marker (T3) 
(15-03-5, n=25) 


44.7 * 
A F27, upper bed - Turakeo tephra (T-T) 1.6! 


r 
Se 0S 10 TT 12 13 14 15 16 


(15-03-4, n=25) 
FeO (wt %) 


4 F26, lower bed - Turakeo tephra (T-T) 
(15-03-3, n=19) 
@ F13, Wolo Sege Ignimbrite (T-WSI) 
(15-03-2, n=25) 

Extended Data Figure 3 | Glass shard geochemistry. a—c, Selected major 
element compositions (weight percent FeO vs KO, FeO vs CaO and SiO 
vs K,0) of glass shards from key rhyolitic pyroclastic density current 
(PDC) and airfall deposits at Mata Menge. d—-h, Weight percent FeO 
versus CaO composition of glass shards from key rhyolitic pyroclastic 
density current (PDC) and airfall deposits at Mata Menge (in stratigraphic 
sequence from youngest to oldest) compared with correlatives from 
adjacent Soa Basin sites. While the major element glass compositions of 
T-WSI, T-T and T-Pu are all geochemically indistinguishable (they are 
most likely from the same eruptive source) the major element data for each 
of the tephra consistently occupies different overlapping fields. Moreover, 
while subtle geochemical differences exist between T-WSI, T-T, and T-Pu, 
these tephra can also be readily distinguished in the field by a combination 
of stratigraphic position and association, as well as by morphological 
expression. i, j, Selected trace element compositions Sr versus Th and Zr, 
and k-m, Y versus Nb, Ce and Th of glass shards from T3 correlatives 
at Mata Menge, Lowo Mali and Kopowatu as well as T6 (uppermost 
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inter-regional marker) from Mata Menge. All trace element concentrations 
are in ppm unless otherwise stated. The data are plotted against equivalent 
elemental mean and standard deviation (represented as + lo error bars) 
reference data from potential distal tephra correlatives (that is, Youngest 
Toba Tuff (YTT), Middle Toba Tuff (MTT), Oldest Toba Tuff (OTT) 

and Unit E from ODP-758) acquired on the same instrument using the 
same standards and under the same analytical conditions*?”. (YTT data 
source: Pearce, N. J. G., Westgate, J. A. & Gatti, E., Multiple magma batches 
recorded in tephra deposits from the Toba complex, Sumatra. V51F-3102, 
AGU Fall Meeting, San Francisco, 14-18 December 2015). Trace element 
data indicate that the upper (T6) and lower (T3) inter-regional marker 
beds occurring at Mata Menge cannot be geochemically related to any 
known Toba-sourced tephra. On this basis, the eruptive sources of T6 

and T3 currently remain unknown. However, this absence of eruptive 
source does not diminish their importance within the overall So’a Basin 
stratigraphy. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 4 stiritiirtisitiiitiiit Jinit boiit Peas terataritiael n 1 L bout Doiatiiit 50 
j on 3 
ag 
1 — . 40 
4 ——e—- 35 
po ge 
j —— ee 2 & 
—= = 
| ——— E 2 OF 
—o—_ F 
4 —_e— f 20 
— 
4 —————— E 15 
a ——— 
4 —7 —_—. a f 10 
| a 3 
T if ° 
404 
354 
30 4 
Be 2s 4 
2 
s 
o 
a 24 r 
154 C 
10 | 
5 0.81 + 0.04 Ma Sr 
y MSWD = 0.59, Prob. = 0.93 ~ 
0 ete ToT 
0.2 0.2 0.6 1.0 1.4 18 22 26 3.0 3.4 3.8 42 
Age (Ma) 
b 4 1 rarer Pare iar ree \ 1 pga 1 A bi 
0.0034 4 ‘2 
0.0032 5 F 
0.0030 4 r 
0.0028 7 r 
0.0026 r 
0.0024 4 r 
0.0022 = Fe 
=, 0.0020 + r 
& 1 
= 0.0018 + E 
0.0016 4 r 
0.0014 5 r 
0.0012 4 F 
0.0010 7 I 
0.0008 4 r 
0.0006 j E 
0.0004 5 F 
| Age = 0.78 20.07 M 
0.0002 | ‘Rear Int. = 303 x 10 " 
| MSWD=0.6, P=0.92, n= 29 f 
0 po SEL URS TLE - 
ie) 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.14 0.12 0.13 0.14 015 
Ar] “Ar 
Extended Data Figure 4 | *°Ar/*°Ar dating results. a, Age probability weighted mean age of the filtered hornblende data for the Pu Maso 
plot for single crystal laser fusion data for hornblende from the Pu Maso ignimbrite is 0.81 + 0.04 Ma (1o; mean square of the weighted deviates 
ignimbrite (sample FLO-15-15; Supplementary Information Table 5); (mswd) = 0.59, prob = 0.93; n = 23/29). b, An inverse isochron plot for 
the vertical scale is a relative probability measure of a given age occurring these 23 analyses gives a statistically overlapping age of 0.78 + 0.07 Ma 
in the sample*’. We applied an outlier-rejection scheme to the main (10; mswd = 0.6, prob. = 0.92). The *°Ar/**Ar intercept of 303 + 10 is 
population to discard ages with normalized median absolute deviations statistically indistinguishable from the atmospheric ratio of 298.6 + 0.3 
of >1.5 (ref. 58) and these are shown as open circles. %*°Ar* refers to (ref. 59), thus supporting the more precise weighted mean age result. 


the proportion of radiogenic *°Ar released for individual analyses. The 
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Extended Data Figure 5 | *°Ar/*°Ar dating results. a, Age probability 
plot for single crystal laser fusion data for hornblende from the PGT-2 
tephra (sample T XII 252-261; Supplementary Information Table 5). 

“Ar* ranges from <10% to nearly 60%. The weighted mean age of the 


Ar! Ar 
filtered hornblende data for the PGT-2 tephra is 0.65 + 0.02 Ma (10; mean 
square of the weighted deviates (mswd) = 0.78, prob = 0.71; n= 17/24). 
b, An inverse isochron plot gives a statistically overlapping, but less 
precise, age of 0.61 + 0.04 Ma (1o; mswd= 1, P=0.19). 
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Extended Data Figure 7 | U-series and ESR samples and dating results. 


a, b, Hominin incisor (SOA-MM6) crown and root samples (number 
3543A and number 3543B, respectively) from Layer II, Mata Menge. 
c-e, U-series laser tracks for Stegodon molar samples from Layer II. 


f, g, Dose response curves obtained 


for the two powder enamel samples 
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from number 3541 and number 3544, respectively. Fitting was carried out 
with a SSE function through the pooled mean ESR intensities derived from 
each repeated measurement. Given the magnitude of the Dg values, 

the correct Dg value was obtained for 5 > Dax/Dz > 10 (ref. 50). 
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Extended Data Figure 8 | Carbon and oxygen isotope analysis of 
dental enamel. a, °C and '80 values of Stegodon florensis and murine 
rodent tooth enamel. All but one of the 5!°C ratios correspond to a Cy 
diet, indicating that the analysed Stegodon and murine rodents were 
predominantly grazers. The positive shift observed in '8O of the younger 
Stegodon samples (from the hominin-bearing Layer II) is more difficult 
to interpret with the limited data available, but could mean a distinct 


source of drinking water (i.e., run-off versus lacustrine) and/or warmer 
conditions. b, Benferroni corrected P values for a pairwise Mann-Whitney 
statistical analysis to test for similarity of 51°C between subsamples. 

c, Benferroni corrected P values for a pairwise Mann-Whitney statistical 
analysis to test for similarity of 6'°O between subsamples; P values 
showing significant differences in median values are in bold. 
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Extended Data Figure 9 | Analytical data for the Mata Menge stone 
technology. a, Artefact counts and provenance, trench E-32 (artefact 
definitions after ref. 60). b, raw materials used to manufacture the stone 
tool assemblage, trench E-32. c, Platform types on flakes and modified 
flakes, E-32. Cortical: the blow was struck onto the cortical surface of a 
cobble. Single-facet: the blow was struck on a scar produced by previous 
reduction. Dihedral: the blow was struck on the ridge between two scars 
produced by previous reduction. Multifacet: the blow was struck on the 
surface of multiple small scars produced by previous reduction. Edge: 
the blow was struck on the edge of the core and a platform surface is not 


retained on the flake. d, Cortex coverage on the dorsal surface of complete 
unmodified flakes, E-32. Percent cortex coverage refers to the proportion 
of the dorsal surface covered in cortex. e, Artefact counts, trenches E-32 
and E-23/27 (artefact definitions after ref. 60). f, Sizes of artefacts and 
attributes, E-32 and E-23/27. g, Raw materials used to manufacture the 
stone tool assemblage, E-32 and E-23/27. h, Scatterplot of complete 

flake sizes, E-32 (total sample size n = 68 complete flakes) and E-23/27 
(n= 443). With regards to raw materials, coarse- and medium-grained 
materials include andesite, basalt, rhyolite, and tuff. Fine-grained materials 
include silicified tuff, chalcedony, and opal. 
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Environmental Breviatea harbour mutualistic 


Arcobacter epibionts 


Emmo Hamann!”, Harald Gruber-Vodicka?, Manuel Kleiner?, Halina E. Tegetmeyer!*, Dietmar Riedel°, Sten Littmann’, 
Jianwei Chen!”, Jana Milucka®, Bernhard Viehweger’, Kevin W. Becker’, Xiaoli Dong, Courtney W. Stairs®, Kai- Uwe Hinrichs’, 


Matthew W. Brown”, Andrew J. Roger® & Marc Strous)?4 


Breviatea form a lineage of free living, unicellular protists, distantly 
related to animals and fungi!?. This lineage emerged almost 
one billion years ago, when the oceanic oxygen content was low, and 
extant Breviatea have evolved or retained an anaerobic lifestyle**. 
Here we report the cultivation of Lenisia limosa, gen. et sp. nov., 
a newly discovered breviate colonized by relatives of animal- 
associated Arcobacter. Physiological experiments show that the 
association of L. limosa with Arcobacter is driven by the transfer of 
hydrogen and is mutualistic, providing benefits to both partners. 
With whole-genome sequencing and differential proteomics, we 
show that an experimentally observed fitness gain of L. limosa could 
be explained by the activity of a so far unknown type of NAD(P)H- 
accepting hydrogenase, which is expressed in the presence, but 
not in the absence, of Arcobacter. Differential proteomics further 
reveal that the presence of Lenisia stimulates expression of known 
‘virulence’ factors by Arcobacter. These proteins typically enable 
colonization of animal cells during infection®, but may in the present 
case act for mutual benefit. Finally, re-investigation of two currently 
available transcriptomic data sets of other Breviatea‘ reveals the 
presence and activity of related hydrogen-consuming Arcobacter, 
indicating that mutualistic interaction between these two groups 
of microbes might be pervasive. Our results support the notion 
that molecular mechanisms involved in virulence can also support 
mutualism®, as shown here for Arcobacter and Breviatea. 

As a cause of genomic innovations and a catalyst of diversification, 
close interactions between eukaryotes and prokaryotes are driving 
forces of evolution’. The importance of eukaryote-prokaryote inter- 
actions is clearly manifested in the remarkable diversity and abundance 
of bacteria that live in symbiosis with large multicellular eukaryotes 
such as animals. The basic adaptive requirements for symbiotic 
bacteria are the capability to recognize and colonize host tissue, evasion 
of defence mechanisms, replication and ultimately the transfer to new 
hosts. Bacteria may have first evolved the capability for symbiotic inter- 
actions with eukaryotes through associations with ancestral protists’. 
Today, several protist lineages are known that have preserved ancestral 
eukaryotic features!*. Characterizing these lineages and their molec- 
ular interactions with bacteria is vital to formulate evidence-based 
hypotheses for the origin and functions of ancestral bacterial-eukaryote 
symbioses. 

By providing nitrous oxide as the only electron acceptor, and the 
bacterium Alteromonas macleodii as prey bacteria, we enriched an 
amoeboid flagellate colonized by spiral-shaped bacteria from an 
anoxic marine tidal-flat sediment. DNA was extracted from the 
enrichment culture and used for metagenomic sequencing (see 
below). Phylogenetic analysis of a concatenated sequence alignment 


comprising 16 universal eukaryotic genes identified the flagellate as a new 
species within Breviatea (Fig. la), with Pygsuia biforma as its closest 
relative. We designated the protist as the novel genus and species Lenisia 
limosa (see Supplementary Notes for diagnosis, habitat description and 
etymology). 

Lenisia limosa gen. et sp. nov. 

L. limosa is a small, marine amoeboid flagellate with a predatory 
lifestyle (Fig. 1c-i). Its morphology has both swimming and adherent 
gliding forms. Adherent cells are 4-9 1m long and 3-4 \1m wide. 
Swimming cells are 4—6 j1m long and 3-4|1m wide. For swimming, 
L. limosa beats its two flagella (anterior flagellum 3-8 1m long and 
posterior flagellum usually 7-19 1m long, two to three times longer), 
resulting in slow, wobbling locomotion. When it encounters a substrate, 
it attaches to it, wraps one flagellum around its lateral side, elongates in 
shape and starts gliding. While gliding, the second flagellum remains 
detached and assists in the acquisition of prey bacteria. These are cap- 
tured with small, filamentous pseudopodia (4-12 1m) originating from 
the ventral side of the cell. Ultrastructural analysis showed the presence 
of several key features previously identified in other breviates (Extended 
Data Fig. 1). These features include a complex internal membrane 
system, centrioles, two basal bodies, a Golgi apparatus, digestive 
vacuoles as well as mitochondria-related organelles (‘hydrogenosomes ). 

Phylogenetic analysis of bacterial 16S rRNA gene sequences, com- 
bined with catalysed reporter deposition fluorescence in situ hybrid- 
ization (CARD-FISH), identified the epibionts of L. limosa (typically 
one to three epibionts per cell) as a species of so far uncultivated 
Epsilonproteobacteria of the genus Arcobacter (Fig. 1b, d, e). Arcobacter 
sp. was closely related to several uncultivated species that were found in 
association with marine animals. CARD-FISH showed that Arcobacter 
bacteria successfully evaded ingestion, as they were never detected 
inside cells of L. limosa. Other denitrifying bacteria (related to Colwellia 
psychrerythraea) were also detected, both by metagenomic sequencing 
and by CARD-FISH microscopy, but these did not colonize cells of 
L. limosa. 

We found that the symbiosis was facultative for both partners. 
Provided with dissolved organic carbon, hydrogen and nitrous oxide 
as the only electron acceptor, Arcobacter sp. grew independent of its 
host. However, in the absence of L. limosa, C. psychrerythracaea became 
much more abundant than Arcobacter (Extended Data Fig. 2). Further 
evidence for the facultative nature of the symbiosis was obtained by 
cultivating L. limosa without nitrous oxide, which resulted in the loss 
of its Arcobacter symbionts (Extended Data Fig. 2). 

Why did Arcobacter sp. colonize L. limosa? To address this ques- 
tion, we investigated potential metabolic interactions between both 
partners with combined metagenomics approaches. We performed 
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Figure 1 | Relatives of animal-associated Arcobacter colonize Breviatea. 
a, Maximum likelihood tree of breviates found in association with 
Arcobacter (in red). b, MrBayes tree of Arcobacter found in association 
with animals or breviates (in red). Dots indicate bootstrap support and 
posterior probabilities values, respectively. The scale bars represent 
substitution rate per site. c, Scanning electron micrograph of L. limosa 
and associated bacteria. Pilus (1), pseudopodial extensions (2), prey 
bacteria (3), short anterior flagellum (4), Arcobacter (5), long posterior 
flagellum (6). The background of this image was removed and the bacteria 
were manually coloured. See Extended Data Fig. 1 for an unmodified 


whole-genome and -transcriptome sequencing for the consortium as 
a whole and performed proteomics for the consortium and for both 
partners grown separately. Raw sequencing reads were assembled 
and binned, resulting in a 47 Mb (16 coverage) provisional whole- 
genome sequence for L. limosa and a 3 Mb genome (15 x coverage) for its 
Arcobacter epibiont. For L. limosa we performed evidence-driven gene 
prediction and repeat identification using the assembled contigs and a 
polyA-tail enriched transcriptome consisting of 48,530 assembled tran- 
scripts (22.5 Mb). We predicted 8,146 protein-coding genes covering 
15.6% of the host genome. Both genomes were inferred to be nearly 
complete, with 95% of conserved eukaryotic and 99% of conserved 
Epsilonproteobacteria genes present (Extended Data Fig. 3). 

L. limosa encoded a mosaic of genes enabling fermentative ATP pro- 
duction (Fig. 2a). Among genes typically supporting aerobic growth, 
we identified genes for a partial tricarboxylic acid cycle, an alternative 
oxidase, as well as a malate-aspartate shuttle. Despite the presence of 
these genes, L. limosa appeared to have lost the capability of oxidative 
phosphorylation because it lacked Complex IV and the capacity for the 
biosynthesis of ubiquinol and cytochrome c. Further, all subunits for 
the F-type ATP synthetase (complex V) were absent. Thus, we infer that 
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micrograph. d, e, Epifluorescence image of CARD-FISH-labelled L. limosa 
(Euk516) and Epsilonproteobacteria (Epsy914). The nucleus was stained 
with 4’,6-diamidino-2-phenylindole (DAPI). f, Transmission electron 
micrograph of the mitochondria-related organelles (mro), nucleus 

(nucl) and extracellular matrix (ex) of L. limosa. g, Mitochondria-related 
organelles with inner (im) and outer membrane (om). h, Differential 
interference contrast micrograph of L. limosa. i, Scanning electron 
micrograph of attached Arcobacter. Each specimen shown represents at 
least ten specimens for which images were recorded. 


the energy metabolism of L. limosa depends strictly on fermentative 
ATP production. 

We identified two potential pathways for pyruvate oxidation in 
the genome of L. limosa, and proteomics revealed that the expression 
of either pathway was dependent on the presence/absence of 
Arcobacter sp. In the absence of Arcobacter, L. limosa appeared 
to metabolize pyruvate mainly through the activity of pyruvate 
formate lyase and ethanol dehydrogenase (Fig. 2, gene numbers [2] 
and [3]). Inside the mitochondria-related organelles, reduced ferre- 
doxin was oxidized by a ferredoxin-dependent hydrogen-evolving 
hydrogenase (Fig. 2, number [8]). This pathway does enable recy- 
cling of cytosolic NADH but is not coupled to the production of 
additional ATP. 

In the presence of Arcobacter, L. limosa was inferred to switch to 
a bioenergetically much more efficient metabolism that produces 
hydrogen not only from ferredoxin but also from NADH. Two distinct 
enzymes for NADH oxidation were identified and both were only 
expressed in the presence of the symbiont: a fusion enzyme with an 
NADH/NADPH-accepting domain, homologous to P450 reductase; 
and a Fe-hydrogenase domain, which was inferred to produce hydrogen 
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Figure 2 | Symbiotic metabolism of L. limosa and Arcobacter. 

a, Symbiotic metabolism of L. limosa and Arcobacter sp. as inferred from 
genomics, transcriptomics and proteomics. We identified two main 
fermentation pathways in L. limosa, of which one was coupled to the 
activity of a NAD(P)H-dependent Fe-hydrogenase (gene number [9], see 
also Extended Data Fig. 4). The latter pathway theoretically yields two 
times more ATP and was only expressed in presence of hydrogen oxidizing 
Arcobacter. Numbers correspond to gene names and expression values 
listed in b and c. Red circles indicate genes that are more highly expressed 
under syntrophic conditions; blue circles indicate genes that are more 


from NADH in the cytosol (Fig. 2, gene number [9], and Extended 
Data Fig. 4). An enzyme with the same inferred domain structure is 
present in the breviate P. biforma'®. Inside the mitochondria-related 
organelles, an NADH dehydrogenase (Fig. 2, gene numbers [10] and 
[11]) might act together with the ferredoxin-dependent hydrogenase 
(gene number [8]) to produce hydrogen by electron confurcation'® 
(Fig. 2). In combination with pyruvate-ferredoxin-oxidoreductase, 
acetate:succinate CoA-transferase and succinyl-CoA synthetase, 
NADH-dependent hydrogen production would enable the oxidation 
of pyruvate to acetate and CO with the production of additional ATP. 
Expression of this entire metabolism was strongly stimulated in the 
presence of Arcobacter (Fig. 2). 

While the production of hydrogen is thermodynamically not prone 
to product inhibition with electrons from ferredoxin, the production 
of molecular hydrogen from NAD(P)H only proceeds at low hydrogen 


highly expressed in non-syntrophic conditions. b, Expression levels of 
proteins involved in energy conservation in L. limosa in the presence (red) 
and absence (blue) of Arcobacter. c, Expression levels of proteins involved 
in energy conservation and organic carbon uptake expressed by Arcobacter 
in the presence (red) and absence (blue) of L. limosa. Error bars, s.d. 

from three independent experiments (see also Extended Data Fig. 2). Ifa 
protein consisted of more than one subunit, the average for all subunits is 
shown. Subcellular localization of proteins was inferred from the presence 
of amino (N)-terminal targeting signals. See Supplementary Table 1 for 
gene accession numbers. 


partial pressure (<5.4\uM at NADH/NAD* = 10)”. The activity of 
the pathways inferred above therefore requires an active hydrogen 
sink'’. Our proteomic analysis suggested that Arcobacter acted asa 
hydrogen sink by expressing a high-affinity, hydrogen-oxidizing 
Ni/Fe-hydrogenase (Fig. 2, number [16], and Extended Data Fig. 5). 
Ata hydrogen turnover rate of 9 x 107!’ mol H per second per cell of 
L. limosa, two Arcobacter epibionts should be able to maintain the 
hydrogen concentration in the L. limosa cytosol at ~5 1M. We also 
detected the expression of proteins that potentially catalyse the uptake 
and utilization of all other fermentation products inferred to be pro- 
duced by L. limosa. This includes the anabolic uptake of acetate, as well 
as the catabolic oxidation of formate. The absence of essential genes 
supporting autotrophic growth (for example, absence of succinyl- 
coenzyme-A synthetase and citrate lyase) indicates a general depend- 
ency of Arcobacter on organic substrates provided by its host. 
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Figure 3 | The fitness of L. limosa depends on its symbiont. Syntrophy 
was enabled by the presence of nitrous oxide acting as electron acceptor 
for bacterial hydrogen oxidation. a, Growth of L. limosa in the presence of 
nitrous oxide (syntrophy) compared with prey abundance (Alteromonas) 
and nitrous oxide concentration. b, Growth of L. limosa in the absence 

of nitrous oxide (no syntrophy) compared with prey abundance and 
hydrogen concentration. Cell numbers of L. limosa and concentrations 
were averaged from two independent experiments per treatment, with the 


This is consistent with high expression levels observed for putative 
amino-acid- and fatty-acid-metabolizing enzymes. Our genomic anal- 
ysis also indicated that hydrogen oxidation could in theory proceed in 
the presence of electron acceptors other than nitrous oxide. We found 
all key genes for denitrification, ammonification, as well as the respira- 
tion of fumarate and oxygen in the Arcobacter genome. Stimulation of 
growth of L. limosa/Arcobacter consortia was confirmed experimentally 
for both oxygen and nitrate (Extended Data Fig. 6g). 

Cultivation of L. limosa in the absence and presence of its symbi- 
onts enabled us to directly measure the fitness effects resulting from 
the symbiosis. As a fitness indicator, we measured growth rates and 
growth yields of L. limosa in the absence and in presence of Arcobacter 
(Fig. 3). As expected, absence of a syntrophic partner resulted in an 
accumulation of hydrogen and in a significantly impaired fitness of 
L. limosa. This was apparent from reduced cell numbers, growth rates 
and growth yields. The growth yield of L. limosa was about two times 
higher during syntrophic growth. Negative effects on the fitness of 
L. limosa were also observed when hydrogen was added directly to the 
culture or when hydrogen oxidation was abolished through a respiration 
inhibitor (Extended Data Fig. 6). In combination, our physiological 
experiments demonstrated true metabolic advantage of L. limosa, pro- 
vided by a hydrogen-scavenging partner, as inferred from combined 
genomics, transcriptomics and proteomics (Fig. 2). 

To understand how Arcobacter sp. managed to colonize cells of 
L. limosa, we inspected the genome of the epibiont for the presence of 
specific genes previously associated with host recognition and attach- 
ment. We found that the Arcobacter epibiont encoded an almost identical 
set of ‘virulence’ genes previously identified in Arcobacter and 
Campylobacter pathogens of animals. Among these gene products, 
the adhesion protein MOMP (Campylobacter jejuni major outer mem- 
brane protein), CadF and flagellin mediate binding to fibronectin 
type IH, which is found in the extracellular matrix of animal cells!*1, 
The L. limosa epibiont was found to express these ‘virulence’ genes and, 
for MOMP, flagellin and a subset of chemotaxis proteins, expression 
was stimulated in the presence of L. limosa (Extended Data Fig. 7). 
Chemotaxis proteins were previously shown to enable pathogenic 
Campylobacter to move into the immediate proximity of their target!®. 
Interestingly, we found that L. limosa expressed two fibronectin 
type III domain-containing proteins, potential targets of Arcobacter 
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error bars showing the full range between both measurements. For cell 
numbers of Alteromonas, determined with CARD-FISH, the error bars 
depict the s.d. of the bacterial cell counts. Growth efficiency was calculated 
from the difference in bacterial and L. limosa cell numbers between two 
different time points twice during exponential growth. All four individual 
results are shown as circles; bar heights indicate averages; error bars, s.d. 
Error bars smaller than data points are not shown. See also Extended 

Data Fig. 6. 


MOMP, CadF and flagellin (Extended Data Fig. 8). Together, these find- 
ings suggest that the mechanism for colonization of L. limosa and animal 
cells by Arcobacter might be conserved at the molecular level. This adds 
to recent evidence that ‘virulence’ genes, although initially described 
for pathogens, may in fact mediate both beneficial and pathogenic 
host-microbe interactions®. 

To investigate whether Arcobacter might also engage beneficially 
with other Breviatea, we screened previously published! transcrip- 
tomic data of P. biforma and Subulatomonas tetraspora for genes affil- 
iated with Arcobacter. In both cases, we identified so far unreported 
Arcobacter 16S rRNA gene sequences (Fig. 1b), as well as transcription 
of high-affinity, hydrogen-oxidizing Ni/Fe-hydrogenases (Extended 
Data Fig. 5). The phylogeny presented in Fig. 1b shows that the three 
Breviatea-associated Arcobacter do not form a monophyletic cluster, 
but are scattered phylogenetically among their animal-associated 
relatives. This could be interpreted in at least two ways. First, the 
symbiosis might have originated once in the ancestor of either 
phylum. Subsequently, Arcobacter co-diversified together with its 
hosts, and radiated to the other phylum multiple times, potentially via 
intermediate, free-living forms. Alternatively, symbiotic Arcobacter 
might have evolved from free-living forms multiple times. In either 
case, this resulted in the use of a similar molecular mechanism for 
colonization (see above). To assess the co-occurrence of Breviatea 
and Arcobacter in present-day marine sediments, currently available 
shotgun metagenomes were screened for the presence or absence 
of L. limosa and other Breviatea, as well as for Breviatea-associated 
Arcobacter. Thirteen out of 25 samples obtained from sediment hori- 
zons favourable for growth of Breviatea/Arcobacter consortia poten- 
tially contained Breviatea (P < 0.01), and 17 potentially contained 
Breviatea-associated Arcobacter, including all samples positive for 
Breviatea (Extended Data Table 1). Although limited metagenomic 
data are available at present, they are consistent with the ecological 
interaction of Breviatea and Arcobacter observed for the enriched 
consortia. The evolutionary roots of this interaction can only be resolved 
by future investigation of more examples of Breviata/Arcobacter 
consortia. 

In conclusion, we have shown that L. limosa is a newly identi- 
fied anaerobic protist, colonized by hydrogen-oxidizing Arcobacter. 
This colonization provides benefits to both partners via interspecies 
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hydrogen transfer, which enables the activity of a newly identified 
NADH-dependent fusion hydrogenase in L. limosa, leading to 
increased ATP yield. The molecular mechanism of colonization 
may involve specific interactions between L. limosa fibronectin type 
II domain-containing proteins and Arcobacter proteins similar to 
‘virulence’ factors previously shown to mediate infection of animal 
tissue by related bacteria. The detection of Arcobacter genes in tran- 
scriptomes of other Breviatea shows that these protists probably engage 
in similar symbioses. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Cultivation of L. limosa. Sediment samples for the initial enrichment of 
L. limosa were obtained from a tidal flat in the German Wadden Sea (53.73585° N, 
7.69905° E). For the enrichments, 100 ml laboratory bottles (DURAN Glastechnik) 
were filled with 50 ml sediment from different depths (0.5-2 cm and 6-8 cm). The 
bottles were filled with cultivation medium and closed without creating a head- 
space. The cultivation medium was based on HEPES-buffered seawater (34g 1~ I 
1mM HEPES, pH 8) (Red Sea Deutschland) and contained prey bacteria for 
protists (final density approximately 10° cells per millilitre), as well as either nitrate, 
nitrite or nitrous oxide (0.2 mM final) as electron acceptor for bacterial denitrifica- 
tion. To allow for growth of micrometre-scaled protists, the small rod-shaped, strictly 
aerobic gammaproteobacterium A. macleodii (strain ATCC 27126, 0.6-0.8 1m 
x 1.4-2 1m in size) was selected as prey bacterium. Prey bacteria were grown on 
solid marine broth medium (ATCC medium 2216). The cells were washed three 
times by serial centrifugation in sterile seawater before they were added to the 
enrichment cultures. Enrichment cultures for protists were incubated in the dark 
at 22°C and the nitrate as well as nitrite concentrations were regularly monitored. 
Depletion of electron acceptor was prevented by periodically adding small portions 
of anoxic concentrated stock solutions of nitrate, nitrite (from a 200 mM stock) or 
pure N2O to the cultures. Depletion of food was prevented by adding portions of 
anoxic suspensions of prey bacteria to the cultures. Growth of protists was regularly 
inspected by light microscopy. When enrichment was observed, a subsample of 
culture was transferred to sediment free anoxic medium for further investigations. 
To best preserve the microbial community naturally associated with L. limosa, 
we did not isolate single protists. Instead we performed serial transfers into fresh 
medium until the presence of only one species could be confirmed via Sanger 
sequencing of 18S rDNA genes and light microscopy. For all results presented in 
this study we only used one strain (strain LL-12) obtained from an enrichment 
culture with nitrous oxide. This strain was maintained in medium containing 2mM 
nitrous oxide and transferred weekly. For the cultivation of L. limosa in the absence 
of Arcobacter, we grew a subculture in medium containing no nitrous oxide. This 
culture was transferred once before proteomic analysis (see below). 

Cultivation of Arcobacter. For the cultivation of Arcobacter in absence of L. limosa, 
we first incubated L. limosa/Arcobacter consortia in anoxic medium containing 
dissolved nutrients but no prey bacteria. The cultivation medium was based 
on HEPES-buffered seawater (34g 1~!, 1mM HEPES, pH 8) and supplemented 
with the following compounds: sodium acetate (10 mM), yeast extract (0.1 g1~1), 
sodium phosphate (2 mM) and ammonium chloride (2 mM). The medium was 
filled in air-tight cultivation bottles and made anoxic by flushing it with a mixture 
of 5% Ho, 10% COz and 85% No. Afterwards, part of the culture headspace was 
replaced with nitrous oxide to achieve a nitrous oxide concentration of 5 mM in the 
liquid. Under the given conditions no growth of L. limosa could be observed. The 
enrichment cultures were transferred twice before proteomic analysis (see below). 
Physiological experiments. Several growth experiments were performed to evaluate 
the importance of bacterial hydrogen oxidation for the fitness of L. limosa (Fig. 3 
and Extended Data Fig. 6). As electron acceptor for hydrogen oxidation we used 
nitrous oxide, which is the last intermediate in the serial denitrification pathway. 
Nitrous oxide is non-toxic and its reduction does not produce any intermediates. 
Thus, it allows for an easy and unambiguous determination of denitrification 
rates by measuring the consumption of nitrous oxide in the medium. To compare 
the growth efficiency of L. limosa under different conditions, we determined cell 
numbers, growth yields as well as respiration rates. To determine cell numbers of 
L. limosa, 501] subsamples of liquid culture were mixed with formaldehyde solu- 
tion (0.02% final), which immobilized swimming cells. The cell abundance was 
determined by manually counting with an improved Neubauer counting chamber 
(BRAND counting chamber, Neubauer improved, 0.1 mm depth). For bacterial 
cell counts, cells were fixed in 1.8% formaldehyde solution (prepared in 34g 171 
sterile seawater, 1 mM HEPES, pH 8) for 2h at room temperature (~20°C), filtered 
onto 0.22 1m white polycarbonate filters (supported by 0.45 1m cellulose nitrate 
filter, Whatman) and stained with the DNA-specific stain DAPI (1jigm 1-!, 3 min, 
at 37°C). The filters were evaluated by epifluorescence microscopy (Leila DM: 
Osram centra mercury-vapour lamp) and the cells were manually counted at 
x 1,000 magnification. 

Growth yields were calculated by measuring the bacterial cell numbers as well as 
the eukaryotic cell numbers between two different time points during exponential 
growth. The differences in bacterial abundance divided by the difference in eukar- 
yotic cell numbers provided a proxy for the gross growth efficiency. 

Respiration rates were determined by adding '*C-enriched A. macleodii bacteria 
to an exponentially growing culture of L. limosa. Digestion of such labelled cells 
leads to the production of 13C bicarbonate, which was measured after conversion 
to 13CO (see Chemical analysis). To investigate the growth of L. limosa in the 
presence of antibacterial antibiotics, ampicillin and streptomycin (0.1 mg ml"! 
each) were added to the medium. Since these antibiotics act on growing cells, they 


did not affect the availability of food bacteria, which were provided in excess at 
the start of the incubation. 

To inhibit denitrifying activity, we added 5% acetylene final to the headspace 
of the cultures and dissolved it by gently shaking the cultures. Acetylene inhibits 
nitrous oxide reductase, the enzyme complex that mediates nitrous oxide 
respiration. 

To test the effect of nitrate on the growth of L. limosa, we added 2mM sodium 

nitrate to the cultivation medium. In addition, we incubated a culture in medium 
that contained 0.2 mM dissolved oxygen. Cell numbers of L. limosa in such treated 
cultures were compared with a culture provided with nitrous oxide (2.2 mM) 
and with a control culture that did not contain an electron acceptor for hydrogen 
oxidation. 
Chemical analysis. Nitrous oxide and hydrogen was measured from the gas 
headspace of the cultures using a GAM 400 mass spectrometer (In Process 
Instruments). Volatile fatty acids were measured with a Syham HPLC system 
(Fiirstenfeldbruck) equipped with an Aminex HPX-87 H HPLC column 
(300 x 7.8mM) and 5mM H,SO, as eluent. Separation was performed in 
isothermal mode at 40°C and the eluted compounds were simultaneously detected 
with an ultraviolet and a refractive index detector at a detection limit of 0.1 mM. 
As calibration standard, a mixture of the fatty acids succinate, lactate, formate, 
acetate, propionate and butyrate was measured at different concentrations. 

Respiration rates were measured as dissolved inorganic '°C released by the 

digestion of !3C-enriched prey bacteria. Subsamples (1 ml) were poisoned with 
0.2% zinc acetate solution at different time-points during the experiments. The iso- 
topic component of dissolved inorganic °C was determined after acidifying with 
hypo-phosphoric acid (1% final) and analysed on a gas chromatography-isotope 
ratio monitoring mass spectrometer (Optima Micromass). To convert isotopic 
compositions to concentrations (moles per litre), standard solutions of NaHCO; 
with known dissolved inorganic °C concentrations were measured. 
CARD-FISH. CARD-FISH was performed on polycarbonate filters as described 
elsewhere’”. For the hybridization of ribosomal RNA we used the following probes. 
For eukaryotes, Euk516, ACCAGACTTGCCCTCC (5/-3’, 0% formamide); for 
Epsilonproteobacteria, Epsy914, GGTCCCCGTCTATTCCTT (5/-3/, 35% 
formamide); and for Alteromonas, Alt184, CCCGTTTGGTCCGAAGAC (5/-3’, 
25% formamide). Before microscopic evaluation, all samples were counter-stained 
with DAPI and embedded in a 3:1 mixture of Citifluor-VectaShield (Citiflour/ 
Vector Labs). Cells were imaged at x 1,000 magnification with an epifluorescence 
microscope (AxioSkop 2 MOT Plus, Carl Zeiss) connected to an AxioCam MRm 
camera (Carl Zeiss). 
Electron microscopy. For transmission electron microscopy, cells were harvested 
at 2,000 r.p.m. (260g) using a Stat Spin Microprep 2 table-top centrifuge. After cen- 
trifugation, the pellet was vitrified ina BAL-TEC HPM-010 high-pressure freezer. 
The samples were substituted at —90°C ina solution containing 0.1% tannic acid 
and 0.5% glutaraldehyde in anhydrous acetone for 72h, and for additional 8h in 
2% OsOy in anhydrous acetone. After a further incubation over 20h at —20°C, 
samples were warmed up to +4°C and washed with anhydrous acetone. The sam- 
ples were embedded at room temperature in Agar 100 (Epon 812 equivalent) at 
60°C over 24h. After ultrathin sectioning (60 nm), sections were counter-stained 
with lead citrate. Samples were analysed with a Philips CM 120 transmission elec- 
tron microscope (Philips) and images were taken with a TemCam F416 CMOS 
camera (TVIPS). 

For scanning electron microscopy, cells were harvested at 400 r.p.m. for 6 min 
using a table-top centrifuge. After centrifugation, the pellet was placed on Teflon 
slides. We then waited 5 min to allow attachment of cells to the slide and fixed them 
with 2% glutaraldehyde solution (in 34g 1”! sterile seawater, 1 mM HEPES, pH 8) 
for 60 min at room temperature. Fixation was followed by a washing step in MilliQ- 
water and an ethanol dehydration series in 30%, 50%, 70%, 90% and 100% ethanol 
(20 min each). Finally, the specimens were subjected to critical-point drying with 
CO, to remove any volatile solvents. The objects were stored in a silica-filled desic- 
cator until microscopic evaluation. Imaging was performed with a Nova NanoLab 
600 scanning electron microscope (FEI). For better identification of bacteria 
(in Fig. 1c), colours were manually added and the background was removed using 
the imaging processing software GNU Image Manipulation Program (the GIMP 
team, version 2.8.14). 

Percoll density-gradient centrifugation. Enrichment of L. limosa/Arcobacter 
consortia from suspended bacterial cells was done by Percoll (Sigma-Aldrich) - 
based density-gradient centrifugation. To form a density gradient, 4.5 ml Percoll 
was mixed with 3.5 ml buffered sterile seawater (1 mM HEPES), filled in 10 ml 
centrifugation tubes and centrifuged for 30 min at 10,000g and 15°C. Afterwards, 
1 ml of culture was carefully loaded on top of the gradient and immediately cen- 
trifuged for another 10 min at 8,000g at 15°C. Cells of L. limosa were enriched ina 
dim white band in the upper third of the gradient, approximately 0.5 cm above the 
bacteria. The L. limosa fraction was diluted in 50 ml sterile seawater and washed 
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twice by serial centrifugation at 800g for 5 min (15°C) to remove Percoll and 
suspended bacteria. The enriched L. limosa fraction was pelleted, frozen and stored 
at —80°C until DNA extraction. 

Next-generation sequencing and in silico procedures. Shotgun metagenomic 
Illumina libraries were constructed from genomic DNA of enriched L. limosa/ 
Arcobacter consortia (see Percoll density-gradient centrifugation). Sequencing 
yielded ~7.5 million raw 2 x 250 bp paired-end MiSeq reads. Reads were quality 
trimmed to Q10 and filtered to a minimum length of 99 nucleotides using Nesoni 
(https://github.com/Victorian-Bioinformatics-Consortium/nesoni). Initial assem- 
bly of the quality-filtered reads was performed with Velvet'®. Bacterial genomes were 
then binned on the basis of percentage GC content, tetratnucleotide composition 
and sequence coverage using Metawatt'’. The binning yielded provisional whole- 
genome sequences for L. limosa, Arcobacter as well as for co-enriched Lacinutrix 
and Colwellia. For a refined assembly of the L. limosa genome, we filtered out all 
reads that mapped to the bacterial bins using Bowtie2 (ref. 20). The remaining 
reads were then used to generate a final assembly for L. limosa, using Velvet and 
Gapfiller”!. This assembly was 48.1 Mb in size and had an N50 value of 9.5 kb. The 
length of the longest scaffold was 135.5 kb. Structural annotations and gene pre- 
dictions for L. limosa were performed with the MAKER pipeline”. After an initial 
ab initio gene prediction with GeneMark-ES”*, we refined the obtained gene models 
with evidence-driven gene predictions using Snap* and 48,530 assembled tran- 
scripts of a polyA-tail enriched transcriptome (see below). Repeat identifica- 
tion, annotation and masking were implemented by RepeatMasker”*. Genome 
completeness was estimated by representation of 159 universal eukaryotic genes 
previously identified in P. biforma! using hidden Markov model-based searches 
with hmmer and blast homology searches. Functional annotation was performed 
with the KEGG automatic annotation server”’. For the core metabolism of 
L. limosa, we manually validated the correct prediction and annotation of each 
gene model by blastx and blastp homology searches against the UniProtKB/Swiss- 
Prot protein database. Translocation of gene products to mitochondria-related 
organelles was predicted on the basis of the presence of N-terminal target peptides 
using TargetP*’ and MitoPROT*. Transcriptional activity of genes was evaluated 
by mapping the reads of a polyA-+ tail-enriched transcriptome to the predicted 
nucleotide gene models. Protein architectures and conserved protein domains 
were identified using HMMER”, the Pfam* database and the SMART*! protein 
domain detection tools. 

For the Arcobacter genome, gene predictions and functional annotations were 
performed using the online annotation-pipeline RAST**. Completeness of the 
Arcobacter genome was evaluated with CheckM® using the implemented set of 
conserved Epsilonproteobacteria reference genes. For the core metabolism and 
virulence related genes, we manually validated the correct prediction and anno- 
tation of each gene model with blastx and blastp homology searches against the 
UniProtKB/Swiss-Prot protein database. 

For mRNA sequencing, RNA was preserved in RNAlater and was extracted as 

previously described*4. Extracted RNA was treated with RQ] DNase (Promega), 
purified with RNeasy MinElute columns (Qiagen) and stored in TE buffer 
at —80°C. Approximately 1 1g of total RNA was used for preparation of an mRNA 
sequencing library following the Illumina TruSeq RNA Sample Preparation 
version 2 guide, using poly-T oligo-attached magnetic beads to enrich eukaryotic 
mRNA. The RNA library was sequenced on a MiSeq instrument in a 2 x 250 bp 
paired-end run. 
Phylogenetic analysis. Eukaryotic phylogenies were determined with an alignment 
consisting of 16 universal eukaryotic genes from 88 different taxa covering all 
major eukaryotic groups as previously published*>. The alignment was comple- 
mented with genes from L. limosa, Pygusia biforma, Breviata anathema as well as 
S. tetraspora. Alignments were calculated with MAFFT** and phylogenies were 
constructed with RaxML*’ using the GTR+GAMMA model for the SSU-rDNA 
partition and the WAG replacement matrix with maximum likelihood estimated 
base frequencies for the amino-acid partitions. We performed 400 rapid bootstrap 
iterations followed by a search for the best-scoring maximum likelihood tree. 

Arcobacter phylogenies were calculated using 16S rRNA sequences as phyloge- 
netic marker genes. Alignments were constructed with MAFFT. Phylogenetic tree 
calculation was performed by bayesian inference using the software MrBayes** with 
the GTR substitution model and gamma rate variation. 

Phylogenetic analysis for the putative NAD(P)H-dependent Fe-hydrogenase 
was done individually for the NAD(P)H-accepting domain and the Fe-hydrogenase 
domain. For this, closely related sequences were obtained from the NCBI non- 
redundant protein database and used to construct a sequence alignment with 
MAFFT. Phylogenies were inferred using RAxML and the WAG replacement 
matrix with fixed base frequencies. We ran 600 rapid bootstrap iterations followed 
by a search for the best scoring maximum likelihood tree. 

Protein extraction and peptide preparation. For proteomics, three parallel cul- 
tures of L. limosa were grown with nitrous oxide in the presence of Arcobacter and 
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three cultures were grown without nitrous oxide and in the absence of Arcobacter. 
In addition, three cultures of Arcobacter were grown with nitrous oxide and in the 
absence of L. limosa (see Cultivation for details on medium composition). The 
cultures were harvested during the early exponential growth phase and immedi- 
ately frozen at —80°C. 

For each of the three treatments, we prepared tryptic digests from three biolog- 

ical replicates following the filter-aided sample preparation protocol described in 
ref. 39. In brief, SDT-lysis buffer (4% (w/v) SDS, 100 mM Tris-HCl pH 7.6, 0.1M 
DTT) was added in a 1:10 sample:buffer ratio to the sample pellets. Samples were 
heated to 95°C for 10 min followed by pelleting of debris for 5 min at 21,000g. 
Thirty microlitres of the cleared lysate were mixed with 200 1l of UA solution 
(8M urea in 0.1 M Tris/HCl pH 8.5) ina 10kDa MWCO 500 centrifugal filter 
unit (VWR International) and centrifuged at 14,000g for 40 min. Two hundred 
microlitres of UA solution were added again and centrifugal filter spun at 14,000g 
for 40 min. One hundred microlitres of IAA solution (0.05 M iodoacetamide in 
UA solution) were added to the filter and incubated at 22°C for 20 min. The IAA 
solution was removed by centrifugation and the filter was washed three times by 
adding 1001] of UA solution and then centrifuging. The buffer on the filter was 
then changed to ABC (50mM ammonium bicarbonate), by washing the filter three 
times with 10011 of ABC. Two micrograms of MS grade trypsin (Thermo Scientific 
Pierce) in 4011 of ABC were added to the filter, and filters were incubated overnight 
in a wet chamber at 37°C. The next day, peptides were eluted by centrifugation at 
14,000g for 20 min, followed by addition of 50 1l of 0.5 M NaCl and again centrifu- 
gation. Peptides were de-salted using C18 spin columns (Thermo Scientific Pierce) 
according to the manufacturer's instructions. Approximate peptide concentrations 
were determined using a Pierce Micro BCA assay (Thermo Scientific Pierce). 
One-dimensional liquid chromatography-tandem mass spectrometry. Samples 
were analysed by one-dimensional liquid chromatography-tandem mass spec- 
trometry (LC-MS/MS) using a block-randomized design as previously described”. 
Two blank runs were done between samples to reduce carry over. For each sample, 
a technical replicate was run. For each run, 800 ng of peptide were loaded onto a 
2cm, 75\um ID C18 Acclaim PepMap 100 pre-column (Thermo Fisher Scientific) 
using an EASY-nLC 1000 Liquid Chromatograph (Thermo Fisher Scientific) set up 
in two-column mode. The pre-column was connected to a 50cm x 75m analytical 
EASY-Spray column packed with PepMap RSLC C18, 2\1m material (Thermo 
Fisher Scientific), which was heated to 35°C via the integrated heating module. 
The analytical column was connected via an Easy-Spray source to a Q Exactive 
Plus Hybrid Quadrupole-Orbitrap mass spectrometer (Thermo Fisher Scientific). 
Peptides were separated on the analytical column at a flow rate of 225 nl min! 
using a 260 min gradient going from buffer A (0.2% formic acid, 5% acetonitrile) 
to 20% buffer B (0.2% formic acid in acetonitrile) in 200 min, then from 20% to 
35% buffer B in 40 min and ending with 20 min at 100% buffer B. Eluting peptides 
were ionized with electrospray ionization and analysed in Q Exactive Plus. Full 
scans were acquired in the Orbitrap mass spectrometer at 70,000 resolution. 
MS/MS scans of the 15 most abundant precursor ions were acquired in the 
Orbitrap mass spectrometer at 17,500 resolution. The mass (m/z) 445.12003 was 
used as lock mass as described in ref. 41 with the modification that lock mass 
was detected in the full scan rather than by separate selected ion monitoring scan 
injection. Lock mass use was set to ‘best. Ions with charge state +1 were excluded 
from MS/MS analysis. Dynamic exclusion was set to 30s. Roughly 160,000 MS/MS 
spectra were acquired per sample run. 
Protein identification, quantification and statistics. For protein identification, 
a custom protein sequence database containing 28,966 proteins, predicted from 
the L. limosa, Arcobacter sp., Alteromonas, Colwellia and Lacinutrix provisional 
whole-genome sequences, was used. The database was submitted to the PRIDE 
repository (see below). For protein identification, technical replicates were com- 
bined and MS/MS spectra were searched against the database using the Sequest 
HT node in Proteome Discoverer version 2.0.0.802 (Thermo Fisher Scientific) with 
the following parameters: trypsin (full), maximum two missed cleavages, 10 p.p.m. 
precursor mass tolerance, 0.6 Da fragment mass tolerance and maximum three 
equal dynamic modifications per peptide. The following three dynamic modi- 
fications were considered: oxidation on M (+15.995 Da), carbamidomethyl on 
C (+57.021 Da) and acetyl on protein N terminus (+42.011 Da). False discovery 
rates (FDRs) for peptide spectral matches (PSMs) were calculated and filtered using 
the Percolator Node in Proteome Discoverer. The Percolator algorithm” ‘uses 
semi-supervised learning and a decoy database search strategy to learn to distin- 
guish between correct and incorrect PSMs. Percolator was run with the following 
settings: maximum delta Cn 0.05, a strict target FDR of 0.01, a relaxed target FDR 
of 0.05 and validation based on q value. 

Search results for all samples were combined into a multiconsensus report with 
Proteome Discoverer and additional filtering criteria applied on the protein level, 
which were at least six PSMs per protein, at least one unique peptide and only 
PSMs with a concatenated rank of 1. This resulted in the following overall FDRs: 
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0.6% for PSMs, 2.6% for peptides and 5% for proteins. The multiconsensus report 
was then exported as a tab-delimited file for further processing. 

For protein quantification, normalized spectral abundance factors (NSAFs) were 
calculated on the basis of PSMs using the method described previously and multi- 
plied by 10,000. The NSAF x 10,000 value gives the relative abundance of a protein 
in a sample as a fraction of 10,000. For statistical analyses of differences between 
treatments, L. limosa and Arcrobacter proteins were analysed separately: that is, 
separate tables were generated. For both organisms the set of proteins was reduced 
by only including proteins that had at least three NSAF values greater than 0 
across all nine samples (three treatments x three biological replicates) and at least 
one NSAF value greater than 0 in each of the treatments that were compared in the 
respective statistical test. The NSAF x 10,000 values in these reduced tables were 
then again normalized to 10,000 for each sample to generate organism-specific 
NSAF values (orgNSAFs) as described in ref. 44. This normalization procedure 
ensures that differences in organism abundance across samples do not lead to the 
false detection of differentially expressed proteins. The tables with orgNSAFs were 
loaded into Perseus software (version 1.5.1.6, http://www.perseus-framework.org/ 
doku.php) and log, transformed. Missing values produced by log>(0) were replaced 
by sampling from a normal distribution assuming that the missing values were on 
the lower end of abundance (normal distribution parameters in Perseus: width 0.3, 
down shift 1.8, do separately for each column). A t-test with permutation-based 
FDR calculation was used to detect proteins that differed significantly in their 
expression level between two treatments. The statistical method implemented 
in Perseus that we used was based on the ‘significance analysis of microarrays’ 
described in ref. 45, which by using a permutation-based FDR, accounted for the 
multiple-testing problem inherent in testing for significant expression differences 
for a large number of genes. The following parameters were used for the test: group- 
ings were not preserved for randomizations, both sides, 250 randomizations, FDR 
of 1% and sO of 0.1. 

No statistical methods were used to predetermine sample size. The investigators 
were not blinded to allocation during experiments and outcome assessment. The 
experiments were not randomized except for the proteomics approach (see section 
on one-dimensional liquid chromatography-tandem mass spectrometry). 
Screening of sediment metagenomes. A SRAdb R/Bioconductor*® query of the 
NCBI Sequence Read Archive (SRA) with search term ‘marine sediment yielded 
130 shotgun metagenomes. After downloading and converting the SRA files to 
FASTQ format, the FASTQ short, <300 bp reads were mapped onto the assem- 
bled contigs (only those >5 kbp) of L. limosa and Arcobacter, using BBMap with 
parameters ‘ambiguous=random qtrim=Ir trimq=10minid=0.90. The remaining 
long, >300bp reads were quality trimmed using BBMap with parameters ‘qtrim= 
Ir trimq=10’ before mapping onto the same contigs using BWASW”’ with default 
settings. For robustness of phylogenetic classification, the low-complexity reads 
identified using sga“* with parameters ‘preprocess --dust --dust-threshold=4’ were 
filtered out from the mapped reads; with diamond blastx®’ the remaining mapped 
reads were searched against a database containing amino-acid sequences of genus- 
level (Bacteria) or family-level (Eukarya) representatives of all forms of life, 
including L. limosa and other Breviatea, as well as L. limosa-affiliated Arcobacter, 
A. nitrofigilis and seven other Arcobacter. Only reads with hits to Breviatea, 
L. limosa-affiliated Arcobacter and A. nitrofigilis, and only hits within 10% of the 
best bit-score with e-value lower than 0.01 and min bit-score at least 50, were con- 
sidered. Distribution of positive hits over multiple loci of L. limosa and Arcobacter 
sp. genome sequences was verified manually. Of the 130 available metagenomes, 15 
had to be rejected because they were not obtained from marine sediments, and 65 
were rejected because they contained too few reads to enable detection of L. limosa. 
Of the remaining 50 metagenomes, 25 were obtained from the top of the sediment, 
a potentially favourable habitat for L. limosa. The remaining 25 were obtained from 
deeper sediment horizons with probably unfavourable conditions. For the latter, 
we calculated an average false positive rate for Breviatea detection of 0.004 reads 
per 1 million reads. This false positive rate is probably overestimated, because we 
cannot completely exclude that the metagenomes from unfavourable conditions 
did not contain Breviatea or were contaminated during sampling. The false posi- 
tive rate was used to calculate, for each sample, using a binomial distribution, the 
probability that the actual number of reads assigned to Breviatea was coincidental 
and not related to the potential presence of Breviatea. We did not apply the same 
statistical procedure to the read maps for Arcobacter, because classifying samples 
as unfavourable for Arcrobacter was not possible since Arcobacter can occupy many 
different ecological niches. Extended Data Table 1 presents the results and SRA 
accession numbers for the analysed metagenomes. 

Bioenergetics and hydrogen fluxes. Per-cell hydrogen production rates were 
calculated from nitrous oxide consumption rates and cell numbers presented 
in Fig. 3, assuming a stoichiometry of 1:3 mol H2:mol N,O (Fig. 2). The hydro- 
gen concentration sustained in the mitochondria-related organelles of L. limosa 
by a single Arcobacter epibiont was calculated to be 10\1M with the equation 


F (mol H2s~!)= D/d x Cx A, with F, the hydrogen flux (~9 x 10-8 mol s~!); 
D, the diffusivity of H, in water (~4.6 x 10~°m*s~}); d, the distance between 
the H> source and the H; sink (2.1 x 107° m); C, the H2 concentration in the 
mitochondria-related organelles (mM or mol m°); and A, the area of diffusion 
(0.4 x 10m ~*). The maximum hydrogen concentration enabling the reaction 
NADH+H*t =NAD*t +H) (AG=+18kJ mol”! (ref. 13)) was calculated to be 
5.4\1M (with Henry’s law constant for H7 =6.33 x 10° Pa and an NADH/NAD* 
ratio of 10 (ref. 50)). The calculations are presented in Supplementary Table 2. 
Reference material and data availability. The culture was deposited in the 
cryo-preserved state in liquid nitrogen at the American Type Culture Collection 
and has been accessioned as Lenisia limosa (strain LL-12) AcqID-00721, and will 
be publicly available as soon as possible. In the meantime, we will provide L. limosa 
from our laboratory stocks upon request. Sequence data are available for download 
from the NCBI SRA database and the Whole Genome Shotgun database and are 
grouped as NCBI BioProject PRJNA277740. The SSU rRNA gene for L. limosa is 
available in GenBank under accession number KT023596. The mass spectrometry 
proteomics data and the protein sequence database have been deposited in the 
ProteomeXchange Consortium via the PRIDE partner repository with the data 
set identifier PXD003275. 
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Euk516 Epsy914 DAPI 


Extended Data Figure 1 | Micrographs for L. limosa and epibiotic 
Arcobacter. a, Scanning electron micrograph showing L. limosa and 
associated bacteria. Pilus (1) connecting Arcobacter (5) with L. limosa. 
Pseudopodial extensions (2) are used for the acquisition of prey bacteria (3) 
(Alteromonas). Short anterior flagellum (4). Long posterior flagellum (6). 
b-e, CARD-FISH labelling with probes targeting the SSU rRNA of 

L. limosa (Euk516 in red) and Arcobacter (Epsy914 in green). The scale bar 
applies to all figures. f-i, CARD-FISH labelling of L. limosa with probes 


targeting the SSU rRNA of A. macleodii (Alt184 in green). The scale bar 
applies to all figures. j-r, Transmission electron micrographs showing 
different structural features of L. limosa. Mitochondria-related organelle 
(mro), nucleus (nucl), digestive vacuoles (dv), double basal body (bb), 
endoplasmic reticulum (er), inner (im) and outer membrane (om), tubular 
cristae (cristae), extracellular matrix (ex), bacterium (bac), membrane 
(mem), flagellum (flag), multivesicular body (mb). For a-i, each specimen 
shown represents at least ten specimens for which images were recorded. 
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Extended Data Figure 2 | Relative abundance of L. limosa and 4-6, absence of nitrous oxide and presence of prey bacteria; 7-9, presence 
co-enriched bacteria under different growth conditions. The abundance _ of nitrous oxide, dissolved organic nutrients and hydrogen and absence 
of L. limosa and its associated microbiota was determined at three of prey bacteria. Relative abundances were determined via proteomics 
different conditions (treatments) with three independent experiments and estimated on the basis of the total normalized spectrum count per 


per treatment: 1-3, presence of nitrous oxide and prey bacteria; population. 
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The 3.0 Mb genome of Arcobacter spec. 


Genome statistics Check M quality check 

Genome size 3.0 Mb Marker lineage Epsilonproteobacteria 
%GC 31.94 +/- 2.52 No. of marker genes 447 

Average coverage 15.3 Present 1x 428 

Contigs 229 Present 2x 18 

Shortest contig 1009 bp Absent 1 

Longest contig 528509 bp 

N50 282378 % Completeness 99.6 


HH Amino Acids and Derivates 
i Protein Metabolism 


Regulation and Cell Signaling 


Virulence, Disease and Defence 


Motility and Chemotaxis 1 Cofactors, Vitamins, Prosthetic Groups, Pigments 
Membrane Transport DO Respiration 
lM Stress Response IB RNA Metabolism 
@ tron Aquisition and Metabolism i Cell Wall and Capsule 
BB Cell Division and Cell Cycle IB Carbohydrates 
HE Miscellaneous G Fatty acids, Lipids and Isoprenoids 
Sulfur Metabolism DD Nitrogen Metabolism 
0 Phosphorus Metabolism 01 DNA Metabolism 
O Nucleosides and Nucleotides 


2892 coding sequences 
1385 assigned to subsystems 


The 46.7 Mb genome of Lenisia limosa 


Genome statistics Genome quality check 

Genome size 46.7 Mb Marker lineage Eukaryota 
%GC 41.06 +/- 3.79 No. of marker genes 159 
Average coverage 16.3 Present 151 
Contigs 11728 Absent 8 

Longest contig 135.5 kb 

N50 9095 % Completeness 95 


Extended Data Figure 3 | Genome statistics for L. limosa and epibiotic Arcobacter. The pie chart represents the classifications of gene models into 
functional categories for Arcobacter. Gene classifications were performed with the RAST functional annotations and the SEED subsystem database’. 
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Extended Data Figure 4 | A new type of NAD(P)H-dependent 


Fe-hydrogenase. The genome of L. limosa encoded a so far undescribed 


NAD(P)H-dependent Fe-hydrogenase. Genes with identical domain 


architecture were also identified in P. biforma and T. vaginalis (shown in 
bold type). The scale bars represent substitution rate per site. a, Phylogeny 


of the Fe-hydrogenases domain. b, Phylogeny of the NAD/NADP 
binding domain. Phylogenies were inferred by RAxML using the 
WAG amino-acid replacement matrix. c, Domain architecture of the 


NAD(P)H-dependent Fe-hydrogenase (2) compared with the domain 
architecture of Fe-hydrogenase (3) and the NADPH accepting domain of 
the cyt P450 reductase (1). The scale bar shows approximate amino-acid 
positions. d, Predicted electron flow within the NAD(P)H-dependent 
Fe-hydrogenase indicates the capability for a proton-dependent recycling 
of NAD(P)H. Note: the shape of the model does not intent to depict the 
actual three-dimensional structure of the protein. 
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Quinone reactive Ni/Fe-hydrogenase large chain 


Escherichia coli (strain K12) 


Citrobacter freundii 

Mannheimia succiniciproducens (strain MBEL55E) 
Shewanella oneidensis 

Arcobacter nitrofigilis (strain ATCC 33309) 
Wolinella succinogenes 


Helicobacter acinonychis (strain Sheeba) 


Helicobacter pylori (strain ATCC 700824) 5 
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Extended Data Figure 5 | Maximum likelihood tree of quinone-reactive Ni/Fe-hydrogenases (subunit hydB). The tree shows the phylogenetic 
relation of quinone-reactive Ni/Fe-hydrogenases from Arcobacter associated with S. tetraspora, P. biforma and L. limosa (indicated in red). Circles 
represent bootstrap support values for each node. The scale bar represents substitution rate per site. 
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Extended Data Figure 6 | The fitness of L. limosa depends on its 
symbiont. Syntrophy was enabled by the presence of nitrous oxide acting 
as electron acceptor for bacterial hydrogen oxidation. a, Inhibition of 
nitrous oxide reduction (addition of the competitive inhibitor acetylene, 
see arrow) led to a reduced growth of L. limosa and reduced respiration 
rates. To monitor respiration rates, !*C-enriched Alteromonas were added 
together with acetylene. Digestion of !¥C-labelled bacteria by L. limosa 
led to the production of °C-bicarbonate, which was measured after 
conversion to '3CO, (right). Similar effects on the growth and respiration 


rates were observed after adding hydrogen (b) or hydrogen and acetate (c) 
to a culture. d, Growth of L. limosa and production of hydrogen and fatty 
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acids while growing syntrophically (nitrous oxide present). e, Growth of 
L. limosa in the presence of antibacterial antibiotics (nitrous oxide absent). 
f, Growth of L. limosa in the presence of antibacterial antibiotics (nitrous 
oxide present). g, Growth of L. limosa in the presence of nitrate (2 mM) 
and oxygen (0.2 mM). Growth of L. limosa was compared with a culture 
that contained nitrous oxide (2.2 mM) and with a control culture that did 
not contain an electron acceptor for hydrogen oxidation. Each panel shows 
the results of at least five independent experiments, with cell numbers 
depicted as averages of seven cell counts per experiment; error bars, s.d. 
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Extended Data Figure 7 | Expression levels for Arcobacter proteins 
involved in attachment and chemotaxis. a. Expression level of proteins 
involved in attachment in the presence (red) and absence of L. limosa 
(blue). b, Expression level of proteins involved in chemotaxis. Expression 
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levels were measured and averaged for three independent experiments 
per treatment (see also Extended Data Fig. 2). Error bars, s.d. See 
Supplementary Table 1 for gene accession numbers and statistical tests. 
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Extended Data Figure 8 | Domain architecture of L. limosa fibronectin type III domain-containing proteins. Protein architectures and conserved 
protein domains were identified using the SMART protein domain detection tools. See Supplementary Table 1 for gene accession numbers and 


expression levels. 
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Extended Data Table 1 | Potential presence of Breviatea and Breviatea-associated Arcobacter detected in currently available shotgun 
metagenomes from marine sediments 


Breviatea Breviatea- 
(including L. associated 
limosa) Arcobacter 
SRA accession F* Description #reads Abundance P-value | Abundance 
(per 1 milion (per 1 milion 
reads) reads) 
SRR2638107 + sediment, upper 2cm, Barataria Bay (USA) 869558078 0.043 0.0000 0.64 
SRR2657558 + sediment, upper 2cm, Bay Jimmy (USA) 943961688 0.036 0.0000 0.86 
SRR2636951 + sediment, upper 2cm, Barataria Bay (USA) 738401828 0.037. 0.0000 0.79 
SRR2637699 + sediment, upper 2cm, Barataria Bay (USA) 986158388 0.028 0.0000 0.13 
SRR2657575 + sediment, upper 2cm, Bay Jimmy (USA) 642092762 0.028 0.0000 3.09 
SRR1793930 - igneous basalt, Louisville Seamounts 65221092 0.123 0.0000 0.02 
SRR577221' + tidal flat sediment upper 2cm 577090 5.198 0.0000 3.47 
SRR577224" + tidal flat sediment upper 2cm 682998 4.392 0.0000 10.25 
SRR265 7076 + sediment, upper 2cm, Barataria Bay (USA) 164252226 0.055 0.0000 0.15 
SRR2657962 + sediment, upper 2cm, Barataria Bay (USA) 595816470 0.015 0.0006 3.65 
SRRS577220" + tidal flat sediment upper 2cm 581409 1.720 0.0023 6.88 
SRR2637322 + sediment, upper 2cm, Barataria Bay (USA) 144887606 0.028 0.0025 0.22 
SRR2656923 + sediment, upper 2cm, Barataria Bay (USA) 695600554 0.012 0.0052 0.14 
SRR2657585 + sediment, upper 2cm, Barataria Bay (USA) 607701178 0.012 0.0083 0.02 
SRR2637706 - Sediment, 8-12cm, Barataria Bay (USA) 140402750 0.021 0.0164 0.14 
SRR2657208 + sediment, upper 2cm, Terrebonne (USA) 402338618 0.012 0.0173 0.01 
SRR2657207 + sediment, upper 2cm, Barataria Bay (USA) 137181860 0.015 0.0855 0.04 
SRR2658004 + sediment, upper 2cm, Barataria Bay (USA) 162613058 0.012 0.1087 2.64 
SRR1627906 - ocean sediment CostaRica, 32m deep 35468880) 0.028 0.1219 0.00 
SRR2637708 - Sediment, 8-12cm, Barataria Bay (USA) 943467314 0.002 0.1668 0.06 
SRR2637690 - Sediment, 8-12cm, Barataria Bay (USA) 596039480 0.005 0.2067 0.22 
SRR2658026 - Sediment, 8-12cm, Barataria Bay (USA) 604747158 0.005 0.2086 0.24 
SRR265 7594 - Sediment, 8-12cm, Barataria Bay (USA) 619813316 0.002 0.2113 0.84 
SRR2657566 - Sediment, 8-12cm, Barataria Bay (USA) 607030206 0.002 0.2176 0.09 
SRR2657582 + sediment, upper 2cm, Barataria Bay (USA) 803328968 0.004 0.2229 0.10 
SRR2657625 + sediment, upper 2cm, Barataria Bay (USA) 505336664 0.004 0.2707 4.71 
SRR2656927 - Sediment, 8-12cm, Barataria Bay (USA) 120576078 0.008 0.2960 0.15 
SRR2657579 + sediment, upper 2cm, Barataria Bay (USA) 135244394 0.007. =0.3133 0.13 
SRR2656924 + sediment, upper 2cm, Barataria Bay (USA) 143593200 0.007 0.3218 0.10 
SRR2656926 + sediment, upper 2cm, Barataria Bay (USA) 157234396 0.006 0.3339) 0.43 
SRR2656925 - Sediment, 8-12cm, Barataria Bay (USA) 618546874 0.000 - 0.01 
SRR2657909 - Sediment, 8-12cm, Barataria Bay (USA) 484269622 0.000 - 0.21 
SRR1179191 - | Mahoney Lake (euxinic) 199487463 0.000 - 0.00 
SRR2638077 - Sediment, 8-12cm, Barataria Bay (USA) 196861140 0.000 - 0.03 
SRR2657627 + Sediment, 0-2cm, Barataria Bay (USA) 157919092 0.000 - 2.29 
SRR2657590 + Sediment, 0-2cm, Barataria Bay (USA) 149050036 0.000 - 0.02 
SRR2657155 - Sediment, 8-12cm, Barataria Bay (USA) 121323126 0.000 - 0.17 
SRR1627905 - ocean sediment Costa Rica margin, 2.9m 86658932 0.000 - 0.02 
SRR1628696 - pacific ocean 280m deep, igneous rock 84164942 0.000 - 0.00 
SRR1971620 - Haakon Mosby mud volcano, 3.7m deep 46060794 0.000 - 0.00 
SRR1628698 - pacific ocean 280m deep, igneous rock 45206218) 0.000 - 1.26 
SRR1628697 - pacific ocean 280m deep, igneous rock 42072910 0.000 - 0.00 
SRR1971621 - | Haakon Mosby mud volcano, 3.7m deep 41183206 0.000 - 0.15 
SRR1627907 - ocean sediment Costa Rica margin, 94m deep 34240066 0.000 - 0.00 
SRR1793929 - pacific ocean, basalt 30470478 0.000 - 0.00 
SRR1793931 - pacific ocean 130m deep, igneous rock 30177380) 0.000 - 0.00 
SRR1022349 + marine fish farm sediment 26561056 0.000 - 0.15 
SRR1971622 - | Haakon Mosby mud volcano, 2.8m deep 20696408) 0.000 - 0.00 
SRR1793928 - pacific ocean, drill fluid 1537452 0.000 - 0.00 
SRR577219! + _ tidal flat sediment, upper 5cm 667625 0.000 - 1.50 
*Sediments were grouped into two habitat types: (+) indicates habitats favourable for growth of Breviatea, (-) indicates habitats unfavourable for growth of Breviatea. The P values (calculated from a 
binomial distribution with a false positive rate obtained from the unfavourable habitats) are the probabilities that the actual number of reads assigned to Breviatea was coincidental and not related to 


the potential presence of Breviatea. 
These metagenomes were obtained from the same site as the inocula for the enrichment of L. /imosa in the present study. 
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Universality of human microbial dynamics 


Amir Bashan!, Travis E. Gibson!, Jonathan Friedman’, Vincent J. Carey!, Scott T. Weiss!, Elizabeth L. Hohmann? & Yang-Yu Liu) 


Human-associated microbial communities have a crucial role in 
determining our health and well-being’”, and this has led to the 
continuing development of microbiome-based therapies’ such as 
faecal microbiota transplantation*>. These microbial communities 
are very complex, dynamic® and highly personalized ecosystems>’, 
exhibiting a high degree of inter-individual variability in both 
species assemblages® and abundance profiles”. It is not known 
whether the underlying ecological dynamics of these communities, 
which can be parameterized by growth rates, and intra- and 
inter-species interactions in population dynamics models”, are 
largely host-independent (that is, universal) or host-specific. If 
the inter-individual variability reflects host-specific dynamics 
due to differences in host lifestyle!!, physiology’? or genetics}, 
then generic microbiome manipulations may have unintended 
consequences, rendering them ineffective or even detrimental. 
Alternatively, microbial ecosystems of different subjects may 
exhibit universal dynamics, with the inter-individual variability 
mainly originating from differences in the sets of colonizing 
species”’!*. Here we develop a new computational method to 
characterize human microbial dynamics. By applying this method 
to cross-sectional data from two large-scale metagenomic studies— 
the Human Microbiome Project”’* and the Student Microbiome 
Project'°—we show that gut and mouth microbiomes display 
pronounced universal dynamics, whereas communities associated 
with certain skin sites are probably shaped by differences in the host 
environment. Notably, the universality of gut microbial dynamics 
is not observed in subjects with recurrent Clostridium difficile 
infection” but is observed in the same set of subjects after faecal 
microbiota transplantation. These results fundamentally improve 
our understanding of the processes that shape human microbial 
ecosystems, and pave the way to designing general microbiome- 
based therapies!*. 

The underlying dynamics of a microbial ecosystem, that is, the eco- 
logical interactions that govern its change, equilibrium and stability, 
can be represented by a population dynamic model 


=f (xs OM) (1) 


which describes the time-dependent abundance profile 
x(t) = (x(t), uwe(t)) of N microbial species present in a par- 
ticular body site of subject v. Here, f (x‘”), @”)) is typically a nonlinear 
function and O™) captures all the ecological parameters, that is, growth 
rates, and intra- and inter-species interactions. Those parameters may 
generally depend on host-independent factors, such as biochemical 
processes and microbial metabolic pathways'’; and on host-specific 
ones, such as nutrient intake?’ and host genetic make-up)’. 

Three fundamental cases could represent the dynamics of M healthy 
subjects: (1) individual dynamics, in which the ecological parameters 
are different in different subjects, that is, OW sz... «@M); (2) group 
dynamics, in which subjects can be classified into K groups (K <M) 
on the basis of certain host factors and subjects in the same group 
share the same set of parameters, that is, 0) = @? for all subjects in 


group P (P=1,...,K); and (3) universal dynamics, in which all the 
subjects share the same set of parameters, that is, 0”) = @ for all 
subjects. If we represent the ecological parameters, such as the 
inter-species interactions, in a directed, weighted ecological network, 
the above three cases can be easily visualized (see Fig. 1). 

Despite its crucial consequences, we do not know which case 
best represents the microbial ecosystems of healthy individuals. 
Addressing this question is vital for developing microbiome-based 
therapies*®, Indeed, if the dynamics are universal, the inter- 
personal variability stems solely from the different assemblages of 
colonizing species in different individuals. We can then design 
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Figure 1 | Alternative scenarios of microbial dynamics across different 
healthy subjects. Microbial dynamics captured by equation (1) is simply 
represented by an ecological network, in which nodes represent species 
(with node sizes proportional to growth rates) and edges represent inter- 
species interactions (with green and red arrows representing excitatory 
and inhibitory interactions, respectively). Different subjects typically have 
different species assemblages, represented by coloured circles near each 
subject. a, The underlying dynamics/network is unique for each subject. 
b, Subjects within the same group share the same dynamics/network that is 
significantly different from that of other groups. c, Different subjects have 
the same underlying dynamics/network. 
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Figure 2 | Higher overlap of microbial communities is associated with 
lower dissimilarity. a, Four gut microbial sample pairs (i-iv) represented 
by stacked bars at the genus level. For each sample pair, their shared genera 
are coloured while non-shared genera are shown in grey. b, DOC 
(in dark blue) of gut microbial sample pairs from the HMP study 


general interventions to control the microbial state (in terms of spe- 
cies assemblage and abundance profile) of different individuals. By 
contrast, if the dynamics are strongly host-specific, we must design 
truly personalized interventions, which need to consider not only the 
unique microbial state of an individual but also the unique dynam- 
ics of the underlying microbial ecosystem. In addition, host-specific 
dynamics, if they exist, raise a major safety concern for faecal micro- 
biota transplantation (FMT), because although the healthy microbiota 
are stable in the donor’s gut, they may be shifted to an undesired state 
in the recipient’s gut. 

The ideal approach to addressing this fundamental question would 
be to infer the dynamic model captured by equation (1) for a large 
number of healthy individuals from temporal metagenomic data, and 
then compare the system parameters @) directly. However, empirical 
parameterization of the exact functional form of f(x” @”) is 
extremely difficult for complex ecological systems. Furthermore, infer- 
ring the system parameters typically requires high-quality time series 
data and well-designed experiments to ensure the system parameters 
are identifiable”!. Such data sets are not currently available. A conven- 
tional correlation analysis of cross-sectional data cannot address this 
question either, because it only captures effective (or indirect) inter- 
actions and is subject to spurious correlations due to the composition- 
ality of relative abundances in genomic survey data”’. 

To overcome these issues, we developed a novel method to detect 
‘fingerprints’ of universal microbial dynamics. This is achieved by 
restricting ourselves to answer the question of whether the dynamics 
are universal or not, rather than the broader and harder question of 
what the dynamics are. The key idea is that when comparing microbial 
communities (samples) from different subjects, we distinguish 
between two contributors to the inter-individual variability: the dif- 
ference in species assemblages and the difference in abundance pro- 
files. We quantify those two contributors by: O(%, y), the overlap of the 
species assemblages, calculated from the relative abundances of the 
shared species; and D(%, #), the dissimilarity between the renormal- 
ized abundance profiles of the shared species (see Methods). Note that 
the two measures (overlap and dissimilarity) are not a priori depend- 
ent on each other. Indeed, D(%, }) is mathematically not constrained 
by any value of O(%, 7) > 0 (see Supplementary Information section 
1.2.1 for details). Hence any constraints of D(%, }) by O(*, y ) observed 
from real data deserve our attention and may have ecological inter- 
pretations (see Fig. 2a, b). 

To compare samples systematically from a given microbi- 
ome data set, we first calculate the overlap and dissimilarity of all 
the sample pairs and represent each sample pair as a point in the 
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(M= 190 samples). Grey dots represent all the 17,955 sample pairs. 
c, DOC (in dark red) of the randomized samples is flat. In b and c, 
and throughout the paper, shaded area indicates the range of the 94% 
confidence interval (see Methods). 


dissimilarity—-overlap plane. We then perform nonparametric 
regression and bootstrap sampling to calculate the dissimilarity- 
overlap curve (DOC) and its confidence interval (see Fig. 2b and 
Methods). In the case of (i) individual dynamics, or (ii) univer- 
sal dynamics but without inter-species interactions, a flat DOC is 
expected (see Supplementary Information section 1.2.3). By contrast, 
for systems with universal dynamics and inter-species interactions, we 
expect the corresponding DOC to display a characteristic feature: a 
negative slope in the high-overlap region; that is, abundance profiles of 
sample pairs become more similar as their overlap becomes higher (see 
Supplementary Information section 1.2 and Extended Data Fig. 1). A 
negative slope can also be seen in the DOC of microbial communities 
characterized by group dynamics. The existence of such group dynam- 
ics however can be easily detected by standard ordination techniques 
and clustering analysis”** and hence ruled out (see Extended Data 
Fig. 2). Note that the DOC analysis described above is not affected by 
the compositionality of the genomic survey data and requires neither 
time series data nor any a priori knowledge of the specific ecological 
dynamics. Instead, it only relies on a few reasonable assumptions (see 
Methods). 

To verify our DOC analysis, we first applied it to synthetic data gen- 
erated from the canonical generalized Lotka-Volterra (GLV) model, 
which has been used for predictive modelling of the intestinal micro- 
biota*-’”. Extended Data Figure 3 shows that in the case of universal 
dynamics with strong inter-species interactions, the DOC displays a 
clear negative slope in the high-overlap region. By contrast, in the case 
of individual dynamics or universal dynamics without inter-species 
interactions, a flat DOC is observed. 

To verify the DOC analysis directly using real data, we analysed 
longitudinal gut microbial samples of four healthy individuals from 
two microbiome studies!!8. For each individual, we expect a highly 
universal microbial dynamics throughout the period of measurement; 
that is, the ecological parameters @”) of the corresponding microbial 
community are largely time-invariant. We found that the DOCs of all 
four subjects show a clear negative slope in the high-overlap region 
(Extended Data Fig. 4), consistent with our expectation. 

Next, we systematically analysed cross-sectional microbial samples 
of different body sites from two large-scale metagenomic studies, the 
Human Microbial Project (HMP)*> and the Student Microbiome 
Project (SMP)'°. The results are shown in Fig. 3 and Extended Data 
Fig. 5. In Fig. 3, for each body site the DOCs calculated from real and 
randomized samples are shown in dark blue and red, respectively. The 
overlap distributions of the real between-subjects sample pairs are 
shown in pink. Note that the characteristic overlap in a particular body 
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site is different in the two studies. For example, the average overlap 
between HMP gut samples is about 0.4 and between SMP samples is 
about 0.75. To account for this fact and to compare the DOCs fairly 
across different body sites and studies, we used two different measures 
to quantify the universality (see Methods). Notably, although these two 
measures quantify different features of the DOC analysis, the body- 
sites stratification pattern is consistent across the two measures and 
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Figure 3 | Detecting universality of microbial dynamics in different body 
sites. a—h, We calculated DOCs for real (dark blue) and randomized (dark 
red) samples of two data sets: (1) SMP: gut (a), tongue (b), forehead skin (c1), 
palm skin (c2); (2) HMP: gut (d), tongue dorsam (e1), attached keratinized 
gingiva (e2), buccal mucosa (e3), hard palate (e4), palatine tonsils (e5), 
subgingival plaque (e6), supergingival plaque (e7), throat (e8), saliva (e9), left/ 
right antecubital fossa (f1/f2), left/right retroauricular crease (f3/f4), vaginal 
introitus (g1), mid-vagina (g2), posterior fornix (g3), anterior nares (h). The 
overlap distributions of the real between-subjects sample pairs are shown in 
pink. The vertical green line represents the change point (see Methods). 


the two studied data sets (Extended Data Fig. 6). In particular, the neg- 
ative slope of DOC is most significantly observed in samples from the 
gut and mouth and least observed in samples from hand skin (palm 
and elbow). These findings strongly suggest the existence of universal 
dynamics characterized by inter-species interactions in the gut and 
mouth microbiomes. 

An alternative explanation for the observed negative slope of the 
DOC for gut and mouth microbiomes of healthy subjects could be that 
some host factors not only select for the presence of certain microbes 
but also drive their relative abundances by enforcing certain optimally 
adapted compositions. To test this alternative explanation, we system- 
atically analysed microbial samples while controlling for the effect of 
several leading candidates for potential confounding factors, for exam- 
ple, body mass index, age, long-term dietary pattern and stool con- 
sistency. We found that as long as their values are in the normal range 
those factors cannot explain the observed DOC pattern (see Extended 
Data Figs 7 and 8). Hence, the alternative explanation for the negative 
slope in DOC is unlikely to be true. Of course, with currently available 
data sets we cannot possibly account for all other confounders, such as 
drugs, genetics, inflammation, or their combinations. More data sets 
will be needed to test this intriguing alternative explanation. 

The above results of healthy subjects raise an interesting question: 
does the universality of microbial dynamics also exist in subjects with 
disrupted microbiomes? To address this question, we applied the DOC 
analysis to microbial samples of 17 subjects with recurrent Clostridium 
difficile infection (rCDI) and the same set of subjects after FMT!”. 
Clostridium difficile is an opportunistic pathogen that causes disease 
worldwide and greatly increases morbidity and mortality in hospital- 
ized patients. Fortunately, FMT is very efficacious in treating patients 
with rCDI, with pronounced clinical improvement even after a single 
treatment”. We found that the dissimilarity between rCDI subjects 
is largely independent of their species overlap, rendering a flat DOC 
(Fig. 4a). By contrast, after FMT (median, 4 days) the DOC displays a 
pronounced negative slope in the high-overlap region (Fig. 4b), sug- 
gesting a universal gut microbial dynamics. FMT treatments show the 
flexibility of microbial communities and their adaptation to composi- 
tion changes. Our result suggests that this adaptive behaviour may be 
associated with the observed universal microbial dynamics after FMT. 


a b Figure 4 | DOC analysis of human subjects 
0.9 0.9. with rCDI. a, Before FMT, the DOC (dark green 
line) of the rCDI subjects is nearly flat. 
b, After FMT, the DOC (dark blue line) displays 
0.8 0.84 : ; ‘ 
a pronounced negative slope in the high-overlap 
region. We denoted a subject pair as a solid 
0.7 0.7} (or hollow) circle if the two subjects received 
FMT from the same donor (or two different 
g 0.6 B 0.6/ donors). Notably, solid circles spread over a 
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even if two subjects share the same donor, their 
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Finally, we anticipate that applying our DOC analysis to subjects 
with other diseases (especially non-gastrointestinal diseases) or infants 
at different developmental stages will offer deeper insights into how 
dynamical processes shape human microbial ecosystems. The devel- 
oped DOC analysis can also be directly applied to other microbial 
ecosystems—for example, the microbiome of soil, ocean, lakes, 
phyllosphere/rhizosphere and fermenters—to detect the universality 
of the underlying ecological dynamics (see Extended Data Fig. 9). 
This sheds light on the design of more advanced methods to extract 
dynamical information from microbial data. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


All the data sets analysed in this work have been published (see Methods section 
‘Human microbiome data sets analysed in this work for details). The original 
experiments and corresponding power analysis have been reported in previous 
publications. 

Overlap between species assemblages. Consider two microbial samples, represented 
by two abundance vectors x = (x, ...,.xv) € IRN and Y= (Vp --oIn) € RY. For 
genomic survey data of the human microbiome, only the relative abundances are 
known. Hence, we are dealing with the relative abundance profiles ¥ = (X, ..., Xv) 
“1 __ To quantify the similarity 


xi 


and ¥ = (Jj, ...» Fy), where X= 


oT and ¥, = 


. N 
j= 1Xj 


J=1Yj 
of the species assemblages (sets) of the two samples, denoted as X = {i| x; > 0} and 
Xit+ J; 


> where 


Y= {i|y, > 0}, we defined the overlap measure O(%,7) =)— 
ieS 
S=X11Y is the set of shared species present in both samples. In case S is empty, 
O(*, 7) =0. IfS={1,..., N}, that is, all the species in X and Y are shared, then 
O(%, y) = 1, but the abundance profiles ¥ and ¥ can still be very different. In the 
extreme case when the relative abundance is the same for all species in X and Y, the 
overlap measure can be written as a function of the classical Jaccard index. Yet, there 
are many advantages of using the overlap measure, instead of the Jaccard index, in 
our analysis (see Supplementary Information section 1.1.4). 
Dissimilarity between abundance profiles. To compare the abundance 
profiles of two samples, we first renormalize the relative abundances of 
only the shared species (in set S), yielding = {Xj}ics and 7 = {f}ics. Here 
xi/ Dkex Xk _ Xi 


a Xi 
x= : 


— and 9. is defined similarly. This way we 
Djeski jeS¥j/Lkex*k — LjesXj yi ¥ f 


remove the spurious dependence between the relative abundances of the shared 
and the non-shared species. More importantly, this renormalization assures that 
the calculated dissimilarity measure is mathematically independent of the overlap 
measure (see Supplementary Information section 1.2.1). The dissimilarity is then 
evaluated via the root Jensen-Shannon divergence (rJSD) measure 


1 
__| Dx (%,m) + De(y, m) |2 


D(&, 9) =Dysp(% 9) 7 


in which m= aes and Dx (2,9) = Nics flog is the Kullback—Leibler diver- 
gence between & and #. The dissimilarity can also be evaluated via any other clas- 
sical dissimilarity measures in ecology and biology, for example, Bray-Curtis 
dissimilarity, Yue-Clayton dissimilarity, and the negative Spearman correlation 
(see Extended Data Fig. 5). In this work, we focused on rJSD because it is a distance 
metric that satisfies non-negativity, identity, symmetry and triangle inequality 
(see Supplementary Information section 1.1.1). Comparing sample pairs on the 
basis of phylogenetic information, for example, using weighted- and unweighted- 
UniFrac*” as quantitative and qualitative measures, respectively, has the potential 
to provide better insight on the communities’ dissimilarity-overlap behaviour. 
However, since the weighted- and unweighted-UniFrac are not mathematically 
independent, they cannot be trivially integrated into our DOC analysis. 
DOC. To compare sample pairs systematically with a wide range of overlap val- 
ues and analyse their dissimilarity—overlap relations, we calculate the overlap and 
dissimilarity of all the sample pairs from a given set of microbiome samples and 
represent each sample pair as a point in the dissimilarity-overlap plane. We then use 
the robust LOWESS (locally weighted scatterplot smoothing) method, a standard 
non-parametric regression method that is resistant to outliers, to calculate the DOC. 
To get the confidence interval, we use the following bootstrap technique. 
(1) From a data set of M samples we calculate the overlap and dissimilarity of the 
M(M —1)/2 sample pairs, represented as M(M — 1)/2 points in the overlap- 
dissimilarity plane. (2) In each bootstrap realization, we resample a new set 
K= {k, ..., ky} from the M original samples with replacement. Some of the orig- 
inal samples might not be included and some could be sampled more than once. 
(3) We create a new cloud of points C: a point associated with sample pair (i, j) is 
included in C only if bothi, j € K, while a point is chosen several times if the sam- 
ple i or j were resampled more than once in K. (4) A new DOC is calculated for C 
using the robust LOWESS method. We set the smoothing parameter (‘spar) to be 
0.2. (5) We repeat steps (2)-(4) T times to create TDOCs. (6) The 3rd and 97th 
percentiles of the T curves represent the 94% confidence interval for the DOC. In 
this work, we chose T= 100. 
Assumptions of the DOC analysis. There are two reasonable assumptions under- 
lying the DOC analysis. First, the abundance profiles of the samples should repre- 
sent the steady states of the microbial ecosystem and hence the fixed points of the 
underlying dynamics that satisfy x = 0. This assumption is fairly reasonable 
because human gut microbiota is a relatively resilient ecosystem’, and until the 
next large perturbation (for example, antibiotic administration or dramatic dietary 
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change) is introduced, the system remains stable for months and possibly even 
years!!783!_ Second, if two communities have the same species assemblages and 
the same abundance profile (steady state), then the two communities have the same 
microbial dynamics. Mathematically, this is not necessarily true, because different 
dynamical systems can give rise to an identical steady state or fixed point. Yet, given 
the large number of species and all the other levels of complexity in their interac- 
tions, the possibility of having different dynamics with the same fixed point is very 
unlikely. Indeed, universal dynamics is the most plausible explanation for the 
observed pattern, that is, the negative slope of DOC in the high-overlap region. 
Limitations of the DOC analysis. We point out that for overlap values close to 
zero, a positive slope may occur as the artefact of dissimilarity between relative 
abundance profiles with small number of species (see Fig. 3e4, f1-f4, g1-3, h, 
Extended Data Fig. 10, and Supplementary Information section 1.1.3). 

We also emphasize that a flat DOC does not completely rule out the possibility 
of universal dynamics. For example, the DOC of the gut microbiome samples of 
rCDI patients is flat (Fig. 4a). There are two possibilities. First, the universality of 
microbial dynamics found in healthy subjects (Fig. 3a, d) is completely lost in the 
rCDI subjects, owing to the infection and/or the dysbiosis caused by the extensive 
inciting antibiotic treatment. Second, the possibly universal microbial dynamics 
of the rCDI subjects are just undetectable by the DOC analysis. This could be 
due to the extremely liquid stool samples of the rCDI subjects that suffer from 
diarrhoea, as stool consistency has been found to be strongly correlated with the 
gut microbiota compositions**”’, It is also possible that the abundance profiles 
of rCDI subjects are markedly varying over time and hence do not represent the 
steady states of the underlying microbial ecosystem (although a mouse infection 
model does not seem to support this hypothesis"). 

If true multi-stability exists, that is, multiple stable states (abundance profiles) 
are associated with the same set of species present in the same environment, then 
our DOC analysis may not detect it. However, true multi-stability in human- 
associated microbial communities has not been demonstrated experimentally, 
partially because any subtle differences in the species assemblages can drive those 
microbial communities’. 

In summary, our DOC analysis detects universal dynamics under certain condi- 

tions. More precisely, it provides a means of discriminating dynamics into universal 
or possibly universal. 
Universality measures and statistical test. Note that the DOCs of different data 
sets/studies must be compared with caution, especially if the microbial samples 
were preprocessed by different pipelines**, for example, with different operational 
taxonomic unit (OTU) clustering thresholds, or different OTU picking methods. 
As shown in Fig. 3, the characteristic overlap in a particular body site is different 
in different studies. For example, the average overlap between HMP gut samples 
is about 0.4 and between SMP samples is about 0.75. To account for this fact and 
to compare the DOCs fairly across different body sites and studies, we used two 
different measures to quantify the universality. 

(1) fas: For each cohort we determined the fraction of data points for which the 
DOC displays a negative slope, denoted as fn. Specifically, for a given DOC calcu- 
lated from a cohort of M microbial samples, we first detected the ‘change point’ O- 

dy(0) 


such that are <0 for any O > O, in which y(O) is a smoothed curve of the DOC 


(for example, using the default ‘smooth’ function of Matlab with moving average 
number of sample pairs with O > Oc 


over 5 neighbours). Then, fps is defined as f,, = 


total number of sample pairs 

Fig. 3a-h, this is the area of the overlap distribution to the right of the green 
vertical line (the change point O,). The results of f,; for different body sites are 
shown in Extended Data Fig. 6a. 

(2) P value. To estimate the slope of the DOC, we used a linear mixed-effects 
model, which explicitly takes into account the fact that those data points in the 
dissimilarity-overlap plane are not completely independent (because for a data 
set of M samples, each sample affects (M — 1) data points). To avoid any potential 
biases due to the detection of change point, we use data points with overlap larger 
than the median value, that is 50% of all the data points, for all the data sets (from 
all the body sites). We repeated this step for 200 bootstrap realizations. The distri- 
butions of the slopes for different body sites are shown in Extended Data Fig. 6b. 
The one-tailed P values are calculated as the fraction of bootstrap realizations with 
a non-negative slope, and adjusted for multiple comparisons by the procedure of 
Benjamini and Hochberg’”. 

We emphasize that those two measures are complementary. In the first measure 
(fns)» We consider the existence of a negative slope and ask what is the fraction of 
data points that support it. In the second measure, we consider a fixed fraction 
of data points (50%) and asked whether a significant negative slope is observed. 
Population dynamics model. The GLV model represents the dynamics 
of N interacting species as a set of ordinary differential equations: 
oxi _ rg oS ajxix;, i= 1, ...) N. Here, r; is the intrinsic growth rate of species 


dt 
j=l 
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i, aj is the interaction strength between species j and i, and ax; (with aj < 0) 
represents the logistic growth term. We considered a microbial ‘sample’ as a steady 
state of a GLV model parameterized by the growth rate vector r= {rj} € RN and 
the interaction matrix A = (aj) € RN* and we set N= 100 and aj)=—1 in our 
simulations. We generated different ‘cohorts, each consists of M= 100 ‘samples. 
The GLV models differ from each other in their specific parameters (rj and aj). 
To achieve that, for each cohort, we first constructed a ‘base GLV model (r*, A*) 
as follows: r? is randomly chosen from the uniform distribution U(0, 1). a; is 
randomly chosen from the normal distribution N(0, (@omax) ), & varies between 
0 and 1, Omax is the maximal interaction strength allowed to ensure stability of 
the ecological system (here omax =0.1). Then, different GLV models 
(v=1,...,M) are generated as random variations of this base model with 


ri = 977; and ai = b/.a;, where both y’ and ei are randomly chosen from a uni- 


form distribution U(1 — 6, 1+ 6) so that the expected values of the parameters 
of those models in the same cohort are the same as the base model, that is, 
E[r], =r* and E[A], = A*. In other words, all the samples of the same cohort are 
generated from GLV models that share the same structure and sign pattern of the 
base interaction matrix A*. As 6 — 0 the model parameters become identical in 
all the GLV models of the same cohort. Thus, 6 =1—6 quantifies the ‘universal- 
ity’ of the dynamics of those models. Finally, for each cohort, the 100 samples 
(steady states) were generated by integrating the GLV differential equations with 
random initial conditions (both initial assemblage and abundance profile are 
randomly chosen). 

Human microbiome data sets analysed in this work. Longitudinal microbiome 
data sets. (1) Two time series of gut microbiome consist of 336 and 131 stool sam- 
ples, respectively. A 16S rRNA gene-based data set, variable region V4, analysed 
here at the OTU level. For detailed description of this data set see ref. 28. The data 
are available at http://qiita.ucsd.edu under study ID 550. (2) Two time series of gut 
microbiome consist of 299 and 180 stool samples, respectively. A 16S rRNA gene- 
based data set, variable region V4, analysed here at the OTU level. For detailed 
description of this data set see ref. 11. The data are available in the European 
Bioinformatics Institute (EBI) European Nucleotide Archive (ENA) under the 
nucleotide accession number ERP006059. 

Cross-sectional microbiome data sets. To compare quantitatively the universality of 
microbial dynamics in different body sites, we used two large-scale microbiome 
data sets. (1) Human Microbiome Project (HMP)”!°. A 16S rRNA gene-based 
data set, variable regions V3 to V5, of the human microbiome from 239 healthy 
subjects. The data are available at http://hmpdacc.org/ and are detailed in refs 9, 15. 
This data set covers 18 body sites in five areas: the oral cavity (nine sites: saliva 
(M= 262), tongue dorsum (M= 291), palatine tonsils (M = 285), keratinized gin- 
giva (M= 289), hard palate (M= 275), buccal mucosa (M= 287), throat (M= 283), 
and sub- and supragingival plaques (M = 283 and M = 289, respectively)), the gut 
(one site: stool (M= 297)), the vagina (three sites: introitus (M = 115), mid-vagina 
(M= 124), and posterior fornix (M= 124)), the nasal cavity (one site: anterior nares 
(M=230)), and the skin (four sites: left and right antecubital fossae (M= 161 and 
M= 171, respectively) and retroauricular creases (M=240 and M = 257, respec- 
tively)). Full protocols are available on the HMP DACC website (http://hmpdacc. 
org/HMMCP). We performed the DOC analysis at the OUT level. We used a single 
sample from each subject. In case more than one sample is available, we used the 
first visit. (2) Student Microbiome Project (SMP). A 16S rRNA gene-based data set, 
variable region V4 from 85 college-aged adults. The data set covers four body sites: 
gut (M=72), tongue (M=79), forehead skin (M=78) and palm skin (M=60). 
In case there are multiple samples measured for one subject, we used the sample 


from the first visit. For detailed description of this data set see ref. 16. The data are 
available at https://github.com/gregcaporaso/student-microbiome-project/tree/ 
master/otu-tables. 

To rule out several leading candidates of confounding factors in our DOC 
analysis, we analysed two additional data sets. (3) A data set of healthy volun- 
teers (M= 98) from the Cross-sectional Study of Diet and Stool Microbiome 
Composition (COMBO). Diet information was collected using two questionnaires 
that queried recent diet (recall) and habitual long-term diet (food frequency ques- 
tionnaire, FFQ). Stool samples were collected, and DNA samples were analysed by 
454/Roche pyrosequencing of the variable region V1-V2 of the 16S rDNA gene 
segments. We performed the DOC analysis at the OTU level. For detailed descrip- 
tion of this data set see ref. 38. (4) A data set of healthy women (M=53), aged 
20-55 years (median 42.5), as part of the Flemish Gut Flora Project (FGFP). Stool 
consistency levels using Bristol stool scale (BSS) scores were self-reported. The V4 
region of the 16S rDNA gene was sequenced. We performed the DOC analysis at 
the OTU level. For detailed description of this data set see ref. 33. 

Clinical trial data set. Stool samples of patients with rCDI: before and after FMT. 
This clinical trial was approved by the Partners Human Research Committee as 
well as by the US Food and Drug Administration (FDA) (Investigational New Drug 
application number 15199) and registered at Clinical Trials.gov (NCT01704937). 
Informed consent was obtained from all participants. Microbial samples from 
17 patients with rCDI were analysed in the groups of pre-FMT and post-FMT. Only 
subjects for whom both pre- and post-FMT samples are available were included. 
In cases where more than one post-FMT sample is available we included only the 
first one (median, 4 days after FMT). The V4 region of the 16S rRNA gene was 
sequenced using an Illumina MiSeq. We performed the DOC analysis at the OTU 
level. For detailed description of this data set see ref. 17. 

Code availability. The Matlab code for computing the DOC and the universality 
measures as well as an example data set (that is, the data set used to generate 
Fig. 2b) are freely available at the project webpage: http://scholar.harvard.edu/yyl/ 
doc and have been added to the Supplementary Information. 
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Extended Data Figure 1 | Displacement of normalized N-dimensional 
random walks. a, Trajectory of a two-dimensional random-walk 
represents the absolute abundance of two species x), x2. The initial state is 
marked by a red circle and the first 100 steps are shown. The solid black 
line is the one-dimensional simplex upon which the locations are 
projected to obtain the relative abundances %, ¥2. The dotted lines starting 
at the origin represent the projection process: all the points in a dotted line 
have the same relative abundances and they are all projected to the 
intersection of the dotted line and the simplex (for example, the solid red 
and green circles are projected to the red and green open circles, 
respectively). We define a new coordinate Z(t) = X2(t) — X(t) for the location 
of normalized relative abundance on the simplex. The displacement of the 
normalized random walk after t steps is then Z(t) — Z(0), where Z(0) is the 
projected location of the initial state (see, as an example, the distance 


between the green and the red open circles in a). b, Distributions of 
displacement of an ensemble of 1,000 random walks after ft steps (t= 1, 5, 
10, 100, 1,000). For small t, the displacement distributions depend on t, 
while for large t (t= 100, 1,000) the distributions are the same. c, Symbols 
represent the average displacement of 1,000 N-dimensional normalized 
random walks (here we set N= 50), measured as D,ysp, and the error 

bars represent the s.d. Each random walk is forced to stay on the positive 
orthant, that is, if x\") < 0 we set x" =0. D,jsp was calculated using all N 
coordinates, setting x‘ = 10-4 as a pseudo count for xl!) = 0. Where f is 
small, the distance grows with increasing t; however, the distance saturates 
for large t. The dashed red and green lines represent the average distance 
between two random locations (green) and between the final locations 
(x!=1009)) of the random walks (red). 
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Extended Data Figure 2 | Detection of group dynamics using an 
ordination technique. a-h, In each row, 500 synthetic samples were 
generated. Samples in the same group were taken from the steady states of 
the same GLV model of 100 species. The initial species assemblages were 
determined in two scenarios: at random (a-d) or on the basis of the group 
(e-h). In the latter scenario, in each group the species were first randomly 
ordered and then in each of the samples the first f species were selected 
and the other been removed (fis randomly chosen from a uniform 
distribution U(20, 100)). In columns a and e, a standard ordination 
technique, that is, principal coordinate analysis (PCoA), was applied. 

All 500 samples were shown in the plane of the first two principal 
coordinates (using rJSD as the distance metric) and coloured according to 
their group. In b and f, only the samples that have high overlap (>0.95) 
with at least one other sample were shown. Panels c and g show the 


dissimilarity distributions P(rJSD) between the high-overlap sample pairs. 
Panels d and h show the DOCs. The ordination technique successfully 
detects the existence of group dynamics (especially when the number of 
groups is small). We anticipate that the group dynamics can also be 
detected by classical clustering analysis. In the scenario of random 
collections, the PCoA of high-overlap samples, that is, samples that have 
high overlap (>0.95) with at least one other sample, is doing better than 
the PCoA of all samples to detect group dynamics, especially for a small 
number (~2-10) of groups. Moreover, for a small number of groups, the 
dissimilarity distributions P(rJSD) can distinguish between the two 
scenarios of initial assemblage selection: random or group-based. The 
ordination technique cannot distinguish between the cases of 500 groups 
(individual dynamics) and single group (universal dynamics). Those cases 
can be distinguished by the DOC analysis. 
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Extended Data Figure 3 | Detecting universality in population robust LOWESS method. The DOC of cohorts generated by GLV models 
dynamics models. Synthetic microbial samples were calculated as without inter-species interactions (a1, a4, a7) is flat even in the 
steady states of GLV models (see Methods). The GLV models are high-overlap region. This is because, without inter-species interactions, 
generated as cohorts (100 models in each cohort) with different levels of for any sample pair the presence or absence of unique (that is, non-shared) 
(i) inter-species interaction strength; and (ii) universality, tuned by the species has no effect on the shared ones. A flat DOC is also observed in the 


parameters & and 6, respectively (see Methods). In each of the 100 models, _case of individual dynamics (a7, a8, a9), where a higher overlap between 
a random fraction f of the species (f~ U(0, 0.8)) was initially removed, and sample pairs does not lead to more similar abundance profiles. However, 
the remaining species were initiated with random abundance (x ~ U(0,1)). _ in the case of universal dynamics with strong inter-species interactions 
The dissimilarity—overlap points of sample pairs in each cohort and of the (for example, a3), the DOC displays a clear negative slope in the 
corresponding randomized samples are shown in light blue and yellow, high-overlap region. 

respectively. The solid curves represent the DOCs calculated using the 
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Extended Data Figure 4 | DOC analysis of gut microbiome samples 
from longitudinal studies. a-d, Sample pairs are selected from four 
different subjects, with number of samples: M,= 299, M, = 180, M. = 336, 
Mg= 131, respectively. The mean DOCs (calculated from 100 bootstrap 
realizations using the robust LOWESS method) of each subject and the 
corresponding randomized samples are shown in dark blue and yellow, 
respectively. The shaded area indicates the range of the 94% confidence 
intervals. The overlap distributions are shown in red. For all the four 
subjects, a clear negative slope of the DOC is observed at the high-overlap 


0.9 1 


0.9 0.95 1 
Overlap 


region, indicating largely time-invariant or universal dynamics for each 
subject throughout the measurement period. This is in marked contrast 
with the flat DOC of the null model (see Supplementary Information 
section 1.3). The secondary peak of lower-overlap samples in b (overlap 
of ~0.8) is of sample pairs from two different periods, before and after a 
Salmonella infection, which represent two distinct microbial steady states 
and thus exhibit a flat DOC. This is consistent with our assumption of 
time-invariant microbial dynamics for a given healthy individual. 
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Extended Data Figure 5 | DOC analysis of gut microbiome samples is 


consistent across different studies and different dissimilarity measures. 
For two microbiome samples, the dissimilarity of their abundance profiles 


over shared species can be evaluated by different measures. Weighted 
measures, such as rJSD, Bray-Curtis (BC) dissimilarity and Yue—Clayton 
(YC) dissimilarity should be applied to the renormalized abundance 
profiles, to ensure mathematical independence between the overlap 

and the dissimilarity measures. Rank-based dissimilarity measures, for 
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example, negative Spearman correlation (nSC), can be directly applied 
without renormalization. We used the four dissimilarity measures (rJSD, 
BC, YC and nSC) to calculate the DOC (using robust LOWESS) of gut 
microbiome samples from two studies: HMP and SMP. In all cases, we 
observed a pronounced negative slope in the DOC (dark-blue curve) of 
real sample pairs (light-blue points) and a flat DOC (orange curve) for the 
pairs of randomized samples (yellow points). 
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Extended Data Figure 6 | Quantifying the universality of human pairs. b, Box plot of the slope of DOC calculated from 200 bootstrap 
microbial dynamics in different body sites. a, The fraction (fs) of data realizations. The slope is calculated by fitting a linear mixed-effects model 
for which a negative slope is observed in Fig. 3. Note that for overlap values _ for data points with overlap larger than the median. We report one-tailed 
close to zero (for example, Fig. 3d, f1-4, g1-3) a positive slope occurs as P values, calculated as the fraction of bootstrap realizations with a non- 


the artefact of dissimilarity between relative abundance profiles with small _ negative slope, adjusted for multiple comparisons by the procedure of 
number of species (see Supplementary Information section 1.1.3). For gut | Benjamini and Hochberg. The null hypothesis of non-negative slope is 
and mouth, a negative slope of DOC is observed in the two data sets for a rejected for all body sites (P< 1 x 10 *) except four skin sites: forehead 
broad range of overlap, indicating a significant universality of microbial (P= 0.099), palm (P= 0.377) in the SMP study and left/right antecubital 
dynamics in those habitats. By contrast, the negative slope of DOC in the fossa in the HMP study (P=0.099 and P=0.495). 

hand’s skin microbiome is observed only for a small part of the sample 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


0.5 . 10 10 
Overlap A BMI A BMI 


0< ABM <2 6} 2< ABMI<3 6} 3< ABMI <6 6} 6< ABM <15 


03 0.3 
0.2 04 06 08 0.2 04 06 08 
Overlap Overlap 


Soo 


PC2 (14%) 


Ds) 
i=} 


-20-10 0 10 20 10 20 30 
PC1 (24%) A Diet 


i-g 


0< ADiet <6 6< ADiet<9 9< ADiet< 14 14 < ADiet < 39 


Long-term diet 


Frequency 


0< Aage<2 2< Aage<4 4< Aage<8 8 < Aage < 22 


03 
0.2 04 06 08 
Overlap 


06 08 2 
Overlap A BSS 


0< ABSS<2 2< ABSS<5 


Frequency 


Stool consistency 


e1 
0.6| All subjects 6 | 153 white subjects 6| 25asian subjects 


| ee 


0.4 


03 0.3 0.3 
0.2 04 06 08 0.2 04 06 08 02 0.4 06 08 
Overlap Overlap Overlap 


Extended Data Figure 7 | See next page for caption. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 7 | Effects of various host factors on the DOC 
analysis. a, The effect of body mass index (BMI) on the DOC analysis. 
al, DOC analysis of all gut microbiome sample pairs among 190 subjects 
from the HMP study. Red points represent samples pairs associated with 
at least one obese subject (with BMI > 30). a2, Same as in al, but 13 obese 
subjects with BMI > 30 were excluded. a3, Blue points represent the gut 
microbiome samples’ overlap and ABMI. The red curve is the average 
(error bars represent the s.e.m.). a4, Dissimilarity versus ABMI. 

a5, Distribution of ABMI values, divided into four groups of equal 
number of pairs. a6-a9, DOC analysis of the sample pairs in each group. 
b, The effect of diet on the DOC analysis. b1, Diet difference (Adiet) 
between two subjects is defined as the Euclidean distance between their 
associated diet scores in the two leading principal components PC1 and 
PC2. In total there are M= 97 healthy subjects in the COMBO study*®. 
b2, Overlap versus Adiet. Blue points represent the overlap and Adiet of 
all gut microbiome pairs among the 97 subjects from the COMBO 

study. The red curve is the average (error bars represent the s.e.m.). 

b3, Dissimilarity versus Adiet. b4, Distribution of Adiet values, divided 
into four groups of equal number of pairs. b5-b8, DOC analysis of the 
pairs in each group. c, The effect of age on the DOC analysis. cl, Overlap 


versus Adiet. Blue points represent the overlap and Aage of all gut 
microbiome samples pairs between the 190 subjects from the HMP 

study. The red curve is the average (error bars represent the s.e.m.). 

c2, Dissimilarity versus Aage. c3, Distribution of Aage values, divided 
into four groups of equal number of pairs. c4-c7, DOC analysis of the 
pairs in each group. d, The effect of stool consistency on the DOC analysis. 
dl, DOC analysis of all sample pairs. In this data set the subjects have 

BSS values between 1 and 6. The points (sample pairs) associated with 
subjects with BSS = 6 (at least one subject has BSS = 6) are coloured in 

red. The black line is the DOC. d2, DOC analysis of all subjects with 

BSS < 6. d3, d4, Among all subjects with 1 < BSS <5, the overlap and the 
dissimilarity are independent of ABSS. d5, Distribution of ABSS values 
for the 46 subjects with 1 < BSS < 5. d6, d7, DOC analysis of the pairs with 
similar BSS values, 0 < ABSS < 1 (d6), and pairs with more different BSS 
values, 2 < ABSS < 4 (d7). In both cases, a clear negative slope of the DOC 
is observed. e, The effect of race on the DOC analysis. e1, All subjects 
(M= 190). e2, e3, White subjects (M = 153) (e2) and Asian subjects 

(M = 25) (e3). Note that in the HMP study, stool samples were collected 
from 153 white subjects, 10 black subjects, 25 Asian subjects, 

and 2 subjects from other races. 
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Extended Data Figure 8 | DOC analysis under special conditions. 
a, The effect of strongly interacting species. A comparison of two GLV 
models of 100 species with random inter-species interactions. The 
system parameters were fixed for all the simulated samples (M = 100), 
representing maximal universality. In al, all species have the same 
characteristic interaction strength, while in a2, the inter-species 
interactions of one species are markedly stronger than all other species, 
representing a strongly interacting species. The presence/absence of the 
strongly interacting species markedly affects (either directly or indirectly) 
the abundance profile of many other species, leading to a pronounced 
secondary cloud of points in the dissimilarity—-overlap plane (a4). The 
effect is the most pronounced in the region of high-overlap (top 5%) pairs, 
and can be detected by looking at their dissimilarity distributions (a5, a6). 
b, DOC behaves the same for samples with uniform or skewed abundance 
distribution. b1, b2, Samples were generated from the steady states of the 
GLV model with largely uniform abundance distribution (determined 
mainly by the species growth rates). In the presence of inter-species 
interactions (b1), a negative slope of the DOC is observed. By contrast, 
in the absence of inter-species interactions (b2), a flat DOC is observed. 
b3, Real samples from the gut (from the HMP study, genus level) exhibit a 
high level of alpha-diversity and a very skewed abundance distribution. 
A negative slope of the DOC in the high-overlap region is observed. 
b4, The randomized samples preserve the abundance distribution of 
the real samples but the effect of inter-species interactions is removed, 
leading to a flat DOC. c, Effect of core species and non-interacting 
periphery species. cl, Samples were generated as steady states of the 
GLV model with N =100 species. The parameters of the GLV model were 
fixed for all the samples, representing maximal universality. The initial 
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species assemblages were chosen as follows: 30 species were present 

in all the samples, representing a set of ‘core species, and the other 70 
‘peripheral’ species were present with lower probability (mean 0.18, 

min 0.12, and max 0.24). c2, Presence probability of real gut microbial 
samples, from the HMP at the genus level. Only one genus (Bacterioides) 
is present in all the samples. c3, Species presence probability in a GLV 
model where all species are present with average probability 0.6. c4, The 
effect of the interactions of the peripheral species. In the GLV model, 

the inter-species interactions among the core species (core-core) has a 
characteristic strength ore = 0.15, and both the periphery—periphery and 
the periphery-core interactions have a characteristic strength 7). When 
o»= 0, that is, the peripheral species do not interact with the core species, 
the DOC is flat. When o, > 0, the DOC has a negative slope. c5, c6, In the 
case of real gut microbiome samples as well as the GLV model without core 
species, the DOC has a negative slope in the high-overlap region. d, The 
effect of sequencing depth on the DOC analysis. d1, Richness (number 

of present OTUs) versus sequencing depth of 190 HMP gut samples. 

12 subjects with fewer than 1,300 reads per sample were excluded and 

the remaining 178 were assigned into two groups of n= 89 subjects, 

with average sequencing depth 3,019 and 8,640 reads per sample. 

d2, d3, The characteristic overlap between samples of group 1 is smaller 
than between samples of group 2. However, DOC analysis of each group 
shows a clear negative slope. d4—d6, Samples of each group were rarefied 
before analysis with minimal community size of 1,317 and 4,333 in group 1 
and 2, respectively, as represented by the black dashed lines in d4. 

d7-d9, Samples of both groups were rarefied before analysis with the same 
minimal community size of 1,317, as represented by the black dashed line 
in d7. 
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Extended Data Figure 9 | DOC analysis of longitudinal microbiome 
data from six lakes in Germany*’. Data downloaded from http://qiita. 
microbio.me, study ID 945. a, Stechlin (M= 440). b, Haus (M = 26). 

c, Tiefwaren (M = 164). d, Melzer (M = 68). e, Breiter Luzin (M= 89). 

f, Fuchskuhle (M = 355). Blue points represent the dissimilarity—-overlap 
values of sample pairs from the same lake. The DOCs of real samples 
from each lake and that from the corresponding randomized samples 
are calculated using robust LOWESS and shown in red and yellow, 
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respectively. For all the six lakes, a clear negative slope is observed for the 
DOCs of real samples, suggesting universal or time-invariant microbial 
dynamics for each lake. Differences in the DOC shapes (for example, the 
moderate DOC slope in b, c and d, in contrast with the steep DOC in 

a, e and f) deserve a systematic study of those microbial ecosystems. This 
example clear demonstrates the applicability of DOC analysis to general 
microbial ecosystems, for example, soil, ocean, rizosphere/phyllosphere 


and fermenters. 
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Extended Data Figure 10 | Average dissimilarity between two 
normalized random vectors. Two independent vectors x, y of n elements 
randomly chosen from the uniform distribution U(0, 1) were generated 


10 20 30 40 


for the different measures. The horizontal black dashed line represents 
the average dissimilarity for n = 100. For all the measures here, the 
dissimilarity displays no n-dependence for n > 15, while Dysc is 
n-independent for any n > 0. Similar analysis was performed for 
vectors whose elements were chosen from power-law distributions 

P(x) ~ x~° with a = 3 (a2, b2, c2, d2 and e2) and P(x) ~x~° with a=2 
(a3, b3, c3, d3 and e3). 


J, 
y and I= 
Lia, 


and then normalized X;= . (Note that in practice all 


a 
n elements are always shared in * and y, since zeros are very unlikely.) 
The dissimilarity D(%, #) is then calculated using the five dissimilarity 
measures (Dysp, Dyysp, Dac, Dyc and Dasc). Average dissimilarity and 


standard deviations of 1,000 pairs are shown in al, b1, cl, d1 and el, 
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Development of the gut microbiota and mucosal IgA 
responses in twins and gnotobiotic mice 


Joseph D. Planer!?, Yangqing Peng!*, Andrew L. Kau!”, Laura V. Blanton!”, I. Malick Ndao®, Phillip I. Tarr?, Barbara B. Warner? & 


Jeffrey I. Gordon! ? 


Immunoglobulin A (IgA), the major class of antibody secreted by the 
gut mucosa, is an important contributor to gut barrier function’ °. 
The repertoire of IgA bound to gut bacteria reflects both T- 
cell-dependent and -independent pathways*’, plus glycans present 
on the antibody’s secretory component®. Human gut bacterial taxa 
targeted by IgA in the setting of barrier dysfunction are capable 
of producing intestinal pathology when isolated and transferred 
to gnotobiotic mice”®. A complex reorientation of gut immunity 
occurs as infants transition from passively acquired IgA present in 
breast milk to host-derived IgA*!'. How IgA responses co-develop 
with assembly of the microbiota during this period remains poorly 
understood. Here, we (1) identify a set of age-discriminatory 
bacterial taxa whose representations define a program of 
microbiota assembly and maturation during the first 2 postnatal 
years that is shared across 40 healthy twin pairs in the USA; 
(2) describe a pattern of progression of gut mucosal IgA responses 
to bacterial members of the microbiota that is highly distinctive 
for family members (twin pairs) during the first several postnatal 
months then generalizes across pairs in the second year; and 
(3) assess the effects of zygosity, birth mode, and breast feeding. 
Age-associated differences in these IgA responses can be recapitulated 
in young germ-free mice, colonized with faecal microbiota obtained 
from two twin pairs at 6 and 18 months of age, and fed a sequence 
of human diets that simulate the transition from milk feeding to 
complementary foods. Most of these responses were robust to diet, 
suggesting that ‘intrinsic’ properties of community members play a 
dominant role in dictating IgA responses. The approach described 
can be used to define gut mucosal immune development in health 
and disease states and to help discover ways of repairing or preventing 
perturbations in this facet of host immunity. 

To define the relationship between assembly of the gut community 
and gut mucosal IgA responses, we collected faecal samples monthly 
for the first 24-36 months of postnatal life from each member of a 
birth cohort of 40 twin pairs (21 monozygotic) who lived in the greater 
metropolitan area of a single city in the USA (St. Louis, Missouri). All 
twins had healthy growth phenotypes as judged by serial anthropom- 
etry; 13 pairs were delivered vaginally, 24 by Caesarean section, and 
three pairs were discordant for mode of birth; 96% received breast milk, 
infant formula, or a combination of the two as the predominant food 
source throughout the first 6 months of postnatal life (Supplementary 
Tables 1-4). 

Gut microbiota assembly was defined following an approach based 
on our previous studies of healthy Bangladeshi and Malawian infants 
and children'”"'°. We generated a random forests (RF)-derived model 
of microbiota development from a bacterial V4-16S rRNA data 
set generated from 1,670 faecal samples collected from the 40 twin 
pairs (20.9 + 6.2 (mean + s.d.) samples per individual). The sparse 
RF-generated model, based on the relative abundances of the 25 most 
age-discriminatory operational taxonomic units (OTUs), could predict 


chronological age for members of twin pairs as well as for biologically 
unrelated individuals (OTUs defined by mapping sequenced reads to 
a reference database of 16S rRNA sequences; see Methods, Extended 
Data Figs 1 and 2 and Supplementary Tables 5-8). We then conducted 
a series of reciprocal tests with the data sets we generated from the 
three birth cohorts. We applied each sparse model to the population of 
healthy infants and children from which it was generated as well as to 
the other two populations. We found that the USA model performed 
consistently across the three populations (Spearman’s correlation coef- 
ficients of 0.73 and 0.78 for the Bangladeshi and Malawian data sets, 
respectively; see Methods and Supplementary Table 9). 

Although previous studies have identified taxa that are shared 
more commonly between adult monozygotic compared with 
dizygotic twin pairs'*!°, our analysis indicated that none of the 
25 age-discriminatory OTUs showed significantly greater concord- 
ance in their relative abundances in monozygotic compared with 
dizygotic twin pairs (Supplementary Table 10). The impact of age, 
family, milk feeding history, and birth mode on the overall phylo- 
genetic configuration of the microbiota was evaluated with a permu- 
tational multivariate analysis of variance (PERMANOVA) and the 
UniFrac metric. Family had the largest effect (Extended Data Fig. 3), 
followed by age, and milk feeding (that is, breast milk versus for- 
mula) (36%, 11%, and 1%, respectively, when considering only those 
samples with associated feeding data; P< 0.001 for all variables 
except birth mode, which did not have a significant effect). A previ- 
ous study, conducted in the immediate postpartum period, reported 
that infants born by Caesarean section have a greater representa- 
tion of skin-derived taxa than those that were vaginally delivered’®. 
A caveat to our study is that we were not able to determine the very 
early effects of birth mode since the median time point for first faecal 
sampling was postpartum day 52. 

Faecal biospecimens were categorized as obtained from donors 
who were ‘predominantly formula fed’ or ‘predominantly breast fed’ 
at the time of sampling (‘predominant defined as comprising >50% 
of that individual's total milk feedings; Extended Data Fig. 4a and 
Supplementary Table 2). Linear mixed-effects modelling disclosed that 
milk feeding practice had a significant effect on maturity (P< 0.001, 
ANOVA with predicted microbiota age as the dependent variable and 
individual/family/chronological age as nested effects). In a post-hoc 
analysis, infants receiving >50% of their milk from formula feedings 
had significantly accelerated development of their microbiota during 
the first 6-7 months of postnatal life compared with infants receiving 
most of their milk from breastfeeding (n =619 and 127 samples, respec- 
tively; Mann-Whitney U-test). These differences were no longer statis- 
tically significant by 12 months (Extended Data Fig. 4b). This finding 
can be explained in part by the significantly lower aggregate relative 
abundance of members of the genus Bifidobacterium represented in 
the RF model in the faecal microbiota of formula-fed infants (Extended 
Data Fig. 4c, Supplementary Table 11 and ref. 17). 


1Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA. @Center for Gut Microbiome and Nutrition Research, Washington 
University School of Medicine, St. Louis, Missouri 63110, USA. 3Department of Pediatrics, Washington University School of Medicine, St. Louis, Missouri 63110, USA. 
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Faecal samples collected during the first postnatal month, and at 
3-month intervals thereafter from each member of the 40 twin pairs, 
were subjected to fluorescence-activated cell sorting (FACS) to charac- 
terize the patterns of IgA targeting of bacterial taxa in their developing 
microbiota (Supplementary Table 12; see Methods for a description of 
‘BugFACS’ with anti-human IgA). V4-16S rRNA gene sequencing was 
performed on three fractions generated from each sample (‘input; IgA‘, 
and IgA). The differential representation of a given taxon between 
the IgA‘ and IgA™ fractions is expressed in the form of a log-normal- 
ized ‘IgA index’ that ranges, in theory, from —1 to 1, with positive and 
negative values indicating enrichment in the IgA* and IgA~ fraction, 
respectively (Fig. 1a). IgA indices are not a simple reflection of the 
relative abundances of organisms in the input fraction (Extended Data 
Fig. 5a). 

We identified 30 OTUs that were significantly enriched in either the 
IgA* or IgA” fraction in three or more age bins (Fig. 1a). Seven OTUs 
exhibited consistently positive IgA indices after the third month of life, 
including two age-discriminatory members of the sparse RF-generated 
model of gut microbiota development (Clostridium nexile OTU 
4436046, Bifidobacterium bifidum OTU 365385; Fig. 1a). Seventeen 
OTUs remained untargeted throughout the first 24 months, including 
six OTUs in the RF-based model (Fig. 1a). Two OTUs manifested 
significant differences in their IgA targeting during the first 2 postnatal 
years: B. longum (OTU C.1) and Escherichia coli (OTU C.3) (Extended 
Data Fig. 5b and Supplementary Table 13). 

We performed an indicator species analysis" across all time points to 
obtain a metric complementary to the IgA index that could describe the 
strength of partitioning of the 30 OTUs into the IgA* or IgA~ fractions. 
The results were largely concordant with those obtained from the IgA 
index-based analysis and provided an additional level of resolution 
of the temporal patterns and specificity of targeting (Supplementary 
Table 14 and Extended Data Fig. 6). 

IgA indices were highly correlated within twin pairs during the first 
21 months of life (Fig. 1b). Indices between unrelated infants were very 
weakly correlated during the first 6 postnatal months, became increas- 
ingly more correlated during the second year of life, and by 24 months 
co-twins no longer had an IgA response that was significantly more 
similar to one another than to other unrelated children (Wilcoxon 
signed-rank test; Fig. 1b). As the effects of family membership dimin- 
ished, variation of the IgA index for a given taxon across the population 
of twins also diminished (Extended Data Fig. 5c). The similarity in the 
IgA profiles between mothers sampled during the first 12 postpartum 
months (39 mothers; 3.0 + 1.0 (mean +s.d.) samples per mother) and 
children at 24 months of life supports the notion that development of 
a child’s gut mucosal IgA responses reaches a state of maturation that 
resembles that of adults by this age (Supplementary Tables 15 and 16 
and Fig. Ic). 

On the basis of Pearson’s correlation distance, we determined that 
age and family membership explained the most variance in IgA indices 
(25% and 19%, respectively), while zygosity and mode of delivery had 
small but statistically significant effects (0.6% and 0.5%, respectively; 
PERMANOVA with 999 permutations). Breastfeeding explained 5% 
of the variance in the model (P < 0.001 for breast milk versus formula 
feeding as well as for age and family). Intriguingly, IgA targeting of two 
taxa, E. coli (OTU C.3) and Ruminococcus gnavus (OTU C.A4), varied 
between children who were predominantly breastfed and those who 
were predominantly formula-fed, with breastfed children exhibiting 
significantly higher IgA targeting of E. coli at 3 months of age and sig- 
nificantly lower IgA targeting of R. gnavus during the latter half of the 
first year (Extended Data Fig. 7). 

To quantify the stage of development of gut mucosal IgA responses, 
we randomly selected 20 unrelated children from the healthy twin 
cohort and generated an RF model based on the IgA indices to the 
30 taxa identified in Fig. 1a. The model was then applied to unrelated 
children represented in the remaining 20 twin pairs (‘test set; n = 40). 
Even though the data set was smaller than the one used to generate the 
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RE model of gut microbiota development, the effort produced a model 
of development of IgA responses that correlated with donor chrono- 
logical age (Spearman's correlation for training set and test set, 0.97, 
and 0.72, respectively; Supplementary Tables 13 and 17). 

Faecal samples from two twin pairs whose pattern of gut microbiota 
development was well described by the RF-derived model and whose 
IgA responses exemplified those of the larger population were selected 
for transplantation into germ-free mice (pairs 4 and 40 in Supplementary 
Table 13). Both twin pairs were predominantly formula-fed 
throughout their first postnatal year (Supplementary Table 2). 
Faecal samples, collected from each of these four individuals when they 
were 6 and 18 months old, were introduced into separate groups of 
male 5-week-old C57BL/6J germ-free mice (n = 4 or 5 mice per donor 
sample; eight treatment groups). Two days before gavage, all mice were 
switched from a standard chow diet low in fat and rich in plant poly- 
saccharides to a sterilized human infant formula diet (see Methods). 
After gavage, animals were maintained on this diet for 14 days and 
then switched to a diet constructed on the basis of a survey of fruits 
and vegetables most commonly consumed by infants transitioning to 
complementary foods”. This diet consisted of isocaloric amounts of the 
powdered infant formula diet and a mixture of sweet potatoes, green 
beans, bananas, and apples. After 10 days, animals were returned to the 
infant formula diet for another 10 days. Faecal samples were obtained 
from recipient mice at frequent intervals throughout all diet phases 
and subjected to 16S rRNA sequencing and/or to BugFACS (Fig. 1d, e). 

Indicator species analysis revealed that the abundances of 15 of the 
top 60 taxa in the RF-derived model of gut microbiota maturation var- 
ied significantly in the context of one or the other diets (false discovery 
rate (FDR)-corrected P <0.05 and indicator value >0.5; Supplementary 
Tables 18 and 19). Most of these taxa responded in the same direction 
(increased or decreased in abundance) to the different diets, inde- 
pendent of the microbiota donor or donor age (Extended Data Fig. 8). 
For example, the age-discriminatory OTU C.1 (B. longum) and OTU 
4439469 (a member of the Ruminococcaceae) have highest mean rela- 
tive abundances during the first 6 months of postnatal life in members 
of the twin cohort (Extended Data Fig. 2c); these taxa also exhib- 
ited significantly greater relative abundance in the faecal microbiota 
of recipient gnotobotic mice when they were consuming the infant 
formula diet. In contrast, Anaerostipes caccae (OTU 259772), which 
peaks in abundance during the latter half of the first postnatal year 
(the period corresponding to introduction of complementary foods in 
our twin study; Extended Data Fig. 2c and Extended Data Fig. 4a), was 
significantly higher in its abundance during the ‘formula plus fruits and 
vegetables’ diet phase (Extended Data Fig. 8). 

To determine whether age-associated differences in IgA responses 
to components of the donors’ microbiota could be recapitulated in 
gnotobiotic mice, we subjected their faecal samples collected at 7, 14, 
24, and 34 days after gavage to BugFACS (Supplementary Table 20). 
IgA responses in mice broadly mirrored those of the human donor 
population; taxa that were consistently not targeted across members of 
the twin cohort during the first 2 years of postnatal life (for example, 
Clostridium clostridioforme OTU C.26 and Clostridium bolteae OTU 
4469576) were generally not targeted in mice colonized with the 6- and 
18-month microbiota samples from the two twin pairs, while bacteria 
targeted in mice belonged to the set of taxa that were consistently IgA 
targeted in infants/children from postnatal months 6-24 (for example, 
Ruminococcus torques (OTU C.6) and Akkermansia muciniphila (OTU 
4306262)) (Fig. 1d and Supplementary Table 21). 

IgA-targeting of five OTUs varied significantly with the diet oscil- 
lation, whether judged by a comparison of the first and second or 
second and third diet phases (FDR-corrected repeated-measures 
ANOVA): they included OTUs whose IgA targeting increased during 
the fruits and vegetables phase (4306262 (A. muciniphila) and C.39 
(Ruminococcus sp. ce2)), and those whose targeting decreased (OTUs 
C.4 (R. gnavus), 4469576 (C. bolteae), 4453304 (other Clostridiales)) 
(Supplementary Table 22). IgA responses were most similar in mice 
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harbouring a given donor microbiota, and were more similar within 
members of a twin pair than between unrelated children (Extended 
Data Fig. 9). 

Applying our RF-derived model of maturation of human gut mucosal 
IgA responses to the mouse BugFACS data set showed that animals 
recapitulated distinct age-associated differences in mucosal IgA 
responses to the (transplanted) human microbiota; that is, for both 
twin pairs, the state of maturation of the IgA response in mice was 
significantly greater when animals were colonized with the 18-month 
compared with the 6-month donors’ communities. Remarkably, this 
significant difference in age-associated responses for a given co-twin 
or twin pair microbiota was evident in both diet contexts (Fig. 1f and 
Supplementary Table 21). We concluded that IgA responses to mem- 
bers of the 6-month-old gut microbiota were shared across the two twin 
pairs and robust to the transition to complementary foods. The fact 
that distinctive responses to the 18-month compared with the 6-month 


(n=4 or 5 mice per group, 143 faecal samples analysed). 
Mean values + s.e.m. are plotted. *P < 0.05; **P< 0.01; 
***D < 0,001; ****P <0.0001 (Mann-Whitney U-test for 
the indicated comparisons). 


microbiota were identified in recipient mice, even in the context of a 
milk (formula) diet, supports the notion that ‘intrinsic’ properties of 
community members (for example, properties not clearly related to 
taxonomy or obviously affected by community composition) play a 
dominant role in dictating the gut mucosal IgA targeting response. 
Our findings point to several directions for future investigation. The 
stability of the IgA molecule, the ease and safety of obtaining faecal 
samples, and the ability to sort members of a faecal microbiota sam- 
ple into IgA-enriched versus non-enriched fractions provide a way 
to non-invasively quantify states of development of the gut mucosal 
immune system as a function of different host and environmental fac- 
tors. BugFACS offers an opportunity to identify previously unappreci- 
ated ‘IgA deficiencies’ presenting not as a lack of, or reduction in, the 
amount of total IgA in the gut lumen, but rather as aberrant patterns 
of IgA targeting. The effects of such deficiencies would need to be 
examined with the understanding that barrier function can be affected 
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by multiple factors besides IgA, including for example mucin”, and less 
well understood elements”!. BugFACS also provides a way to explore 
how IgA targeting of bacterial taxa varies as a function of their prox- 
imity to the intestinal epithelium and their location along the length 
of the gut. The importance of the small intestine as a source of the 
T-cell-independent IgA response to members of the microbiota was 
highlighted in a recent study’. 

In principle, deviations from the pattern of convergence of IgA 
responses observed in the present study could occur in scenarios 
where colonization is abnormal, resulting in pathological immune 
responses in anatomically distant locations, such as that observed in 
asthma” or various autoimmune/immunoinflammatory disorders. 
One obvious next step is to assess the generalizability of the shared 
pattern observed in this study by characterizing healthy members of birth 
cohorts representing different geographical areas, distinctive cultural 
and dietary traditions, and living environments with varying degrees 
of sanitation. Gut microbiota development is impaired in children with 
undernutrition!”. Given that undernutrition is associated with impaired 
gut barrier function and responses to particular vaccines”’, a compar- 
ison of the development of gut mucosal IgA responses to members 
of the microbiota in healthy and undernourished members of birth 
cohorts could provide a metric for disease classification, assessment of 
the impact of enteropathogen infection/burden, and a means of assess- 
ing the efficacy of current or new therapeutic interventions, including 
approaches for oral vaccination. 

The ability to re-enact and recapitulate features of the development 
of gut mucosal IgA responses to human donor gut microbial commu- 
nities in wild-type or genetically manipulated gnotobiotic mice should 
help delineate the mechanisms that control the temporal evolution 
and specificity of IgA responses to members of the gut community, 
the effects of the IgA response on targeted microbes and other mem- 
bers of the microbiota, as well as the impact on host biology. As such, 
these models could be used to identify new strategies for deliberately 
manipulating mucosal barrier/immune function, including food-based 
and/or microbial interventions. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references 
unique to these sections appear only in the online paper. 
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METHODS 


Human studies. Protocols used for recruitment of participants, obtaining 
informed consent, collecting and de-identifying faecal samples, and acquiring 
and de-identifying clinical metadata were all approved by the Human Research 
Protection Office of Washington University School of Medicine. A total of 40 twin 
pairs were included in this study; this number was not determined by a power 
calculation. 

All breast milk provided was from the mothers of the twins themselves and was 
not pasteurized. Breast milk was given from the breast directly or from bottles after 
being expressed by the mother. Expressed milk was given immediately or stored 
temporarily in home freezers; in the latter case, mothers were instructed to thaw 
their milk in warm water. 

Determination of zygosity. Zygosity testing of same-gender twins was performed 
on residual blood samples obtained for clinical care, or samples obtained at the time 
of mandatory Missouri-state metabolic testing. Short tandem repeat polymorphic 
DNA markers were amplified from blood DNA by PCR, labelled with fluorescent 
markers, and separated by capillary electrophoresis to distinguish different alleles 
at each of ten different loci (D3S1358, vWA, FGA, Amelogenin, D8S1179, D21S11, 
D18851, D5S818, D13S317, and D7S820). 

V4-16S rRNA gene sequencing and data analysis. Faecal samples were quickly 
frozen at —20°C and subsequently stored at —80°C. Samples were pulver- 
ized in liquid nitrogen and DNA was extracted from an aliquot of the material 
(130 + 36 mg; mean +s.d.) by bead-beating in a solution consisting of 500 1l of 
phenol:chloroform:isoamyl alcohol (25:24:1), 210 1l of 20% SDS, and 500 il of 
buffer A (200 mM NaCl, 200 mM Trizma base, 20 mM EDTA). DNA was further 
purified with Qiaquick columns (Qiagen), eluted in 70,11 of Tris-EDTA (TE) buffer, 
and quantified (Quant-iT dsDNA broad range kit; Invitrogen). The concentration 
of each DNA sample was normalized to 1 ngjil-! and the DNA was subjected to 
PCR with phased, barcoded primers directed against variable region 4 of the bacte- 
rial 16S rRNA gene”4. Amplicons were quantified as above, pooled, and sequenced 
on an Illumina MiSeq instrument (paired-end 250 nucleotide reads). Paired-end 
reads were merged (FLASH, version 1.2.6). De-multiplexed reads were clustered 
into OTUs with the 97% identity sequence set from the GreenGenes 2013 reference 
database and QIUIME version 1.8 (ref. 25). An ‘abundance-filtered data set’ was 
generated by selecting OTUs that were detected at >0.1% relative abundance in 
>1% of the samples; only these OTUs were considered for further analysis. 

Taxonomy was assigned to 97% nucleotide sequence identity OTUs with RDP 
2.4, as described previously*®. Taxonomically related OTUs that shared a high 
degree of rank order co-linearity of their abundance (Spearman's p > 0.7) were 
consolidated as follows: (1) for each family-level taxon that was detected in the 
abundance-filtered data set, a list of OTUs belonging to the family was generated; 
(2) the abundances of OTUs within this family across all samples analysed were 
then correlated with each other to generate Spearmar’s correlation coefficients 
for each OTU-OTU comparison; and (3) counts for OTUs that shared a high 
degree of co-linearity with each other were then combined to generate the consol- 
idated OTUs according to the scheme that is illustrated in Extended Data Fig. 1 
(see Supplementary Table 5 for a list of OTUs that were consolidated; note that 
OTUs used to generate a ‘consolidated OTU’ shared on average 99.3 + 0.4% 
(mean + s.d.) nucleotide sequence identity in their V4-16S rRNA sequences; OTUs 
that did not satisfy the threshold cutoff for co-linearity in abundance were not con- 
solidated). A new OTU table was then generated, consisting of consolidated OTUs 
that were assigned a new OTU sequence identity and all other non-consolidated 
OTUs. These OTU tables were then rarefied to depths of 1,000 reads for the 
BugFACS-related analyses described below, and 5,000 reads for all other analyses. 
BugFACS of human samples. A separate aliquot of a pulverized frozen faecal 
sample was transferred to a pre-weighed 1.5 ml microcentrifuge tube (Life 
Technologies), and processed as described previously with some minor 
modifications®. Samples were resuspended in 1 ml of PBS, vortexed at room tem- 
perature for 5 min (1,500 rotations per minute), and then placed on ice for 5 min 
to allow large particulate matter to settle by gravity. A volume equivalent to 5 mg 
of pulverized faecal material was passed through a nylon 70j1m mesh filter (BD). 
One millilitre of ice cold PBS was added to each filtered sample, which was then 
centrifuged at 10,000g for 3 min (4°C). The resulting supernatant was discarded 
and the cell pellet was resuspended in 1001] of a 1:50 dilution of goat anti-human 
IgA conjugated to DyLight 650 (Abcam; catalogue number ab96998). Samples 
were subsequently incubated on ice in the dark for 30 min, washed with 1 ml of 
PBS, and resuspended in a 1:4,000 dilution of SytoBC bacterial DNA stain (Life 
Technologies; prepared in HEPES-NaCl buffer (0.9% NaCl, 10 mM HEPES)), 
immediately before introduction into a FACSAria III cell sorter (Bectin Dickinson). 

For each sample, 50,000 cytometer ‘events’ were recovered from the ‘Input; 
‘IgA* and ‘IgA ~ gates (for details of the sorting protocol and gating strategies, see 
ref. 8). Additionally, samples of sheath fluid were collected immediately before and 
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after sorting to allow assessment of any potential contaminants in fluid lines. Sorted 
fractions and control sheath fluid samples were frozen and stored at —20°C. Each 
BugFACS-sorted fraction was subjected to V4-16S rRNA gene PCR in triplicate 
2011 reactions. Each reaction contained 21] of 10X HiFi PCR Buffer (Invitrogen), 
0.8 ul of 50mM magnesium sulfate (Invitrogen), 0.411 of dNTP mix (Invitrogen), 
0.16 tl of Platinum Taq (Invitrogen), 111 of a 5 uM stock of forward PCR 
primer, 1 tl of 5 (uM barcoded reverse PCR primer”, 2.5 11 of BugFACS sorted 
cells, and 12.111 of water. A negative control reaction with no sorted cells was 
included for each barcoded primer. The following PCR conditions were used: 95°C 
for 10 min followed by 31 cycles of 95°C for 30s, 53°C for 30s, and 68°C for 45s, 
followed by 68°C for 2 min. Triplicate reactions were pooled and subjected to 1% 
agarose gel electrophoresis to verify the presence of a PCR product (these gels also 
contained negative control reactions). If any of the three sorted fractions from a 
given sample failed to amplify successfully, PCRs were repeated for 34 cycles for 
all three fractions under the same temperature cycling conditions. PCR-amplified 
fractions were pooled in equal proportion. Although amplicon bands were not visi- 
ble for sheath fluid controls, a set volume of these reactions was also included in the 
sequencing pool. Pooled amplicons were purified with magnetic beads (AMPure 
XP, Agencourt) and subjected to multiplex sequencing (paired-end 250 nucleotide 
reads) on a MiSeq instrument as above. 

After OTU picking, but before abundance filtering, sheath fluid-contaminating 

OTUs were identified as sequences that constituted >1% of the reads in both the 
pre- and post-sort sheath fluid samples for a given day. Contaminants that were 
identified on more than 2 days were removed from the OTU table. If multiple 
genera within a family-level taxon were identified as sheath contaminants, 
the entire family was removed from the OTU table. This list included the following 
families: Burkholderiaceae, Xanthomonadaceae, Comamonadaceae, Brucellaceae, 
Pseudomonadaceae, Xanthobacteraceae, and Alcaligenaceae. Altogether, OTUs 
belonging to these families accounted for less than 0.05% of all sequences in the 
twin pair, maternal, and mouse faecal samples, and for less than 2% of all sequences 
in samples subjected to BugFACS. IgA indices were subsequently calculated for 
a given taxon in a given sample if that taxon comprised >0.5% of the 16S rRNA 
reads in either the IgA* or IgA~ fraction. 
RF modelling. RF modelling of gut microbiota development was performed with 
the ‘randomForest’ package”’ in R. Input data consisted of OTU data rarefied to a 
depth of 5,000 V4-16S rRNA reads per faecal sample. Feature importance scores for 
each OTU in the data set were calculated by randomly selecting one co-twin from 
half of the twin pairs (n = 20 individuals). An RF model was generated from this 
subset of data. Randomization and this process of model construction were per- 
formed 100 times (100 trees per model). Feature importance scores were extracted 
from each model, averaged across the 100 models, and used to rank the OTUs from 
highest to lowest feature importance. 

To estimate the number of OTUs needed to build a sparse model, a new set of 
RF models was generated by selecting one co-twin from half of the twin pairs in 
the cohort as above, and evaluating the performance of the model (Spearman's 
pand the adjusted 7’ of a linear model as metrics) when applied to (1) the indi- 
viduals used to generate the model, (2) their co-twins, and (3) all unrelated faecal 
samples (‘training’ ‘co-twin, and ‘test sets, respectively). A series of models was 
built with increasing numbers of OTUs, starting with the OTU assigned the highest 
feature importance score, and sequentially adding OTUs in decreasing order of 
feature importance. For each model of a different size, ten randomizations were 
performed; performance of the model was averaged across the independent rep- 
licates to generate standard error measurements. The subset of 25 OTUs with 
highest rank order of feature importance scores was used to create a sparse model. 
This sparse model, generated from samples collected during the first 36 months 
of postnatal life, was applied to 16S rRNA data sets generated from faecal samples 
collected between 1 and 24 months of age to predict chronological age in members 
of the ‘training; ‘co-twin, and ‘test’ subsets as described above. A parallel RF-derived 
model was generated from IgA index data for the 30 OTUs shown in Fig. la. Ifa 
taxon was not detected in either the IgA* or IgA~ fraction, it was given a value 
of 0 before model construction. This model was applied to the ‘training; ‘co-twin, 
and ‘test’ sets. 

OTUs were reassigned to incorporate data sets from all three countries (USA, 
Bangladesh, and Malawi), resulting in a second set of consolidated OTUs (see 
Supplementary Table 9). Feature importance scores were calculated by iteratively 
regressing each country’s training set of samples 100 times against chronological 
age (100 trees per model); OTUs were ranked by the mean values of their feature 
importance scores across the 100 models. The 25 most age-discriminatory OTUs 
were used to generate each respective country’s sparse RF model. Each model was 
used to predict the microbiota ages of members of that country’s corresponding 
test set, as well as the microbiota ages of all members of the healthy cohorts from 
each of the other two countries. Spearman’s correlation coefficients were generated 
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by building each sparse RF model ten times, correlating predicted microbiota ages 
with chronological ages, and averaging the coefficients. 

Animal studies. All experiments involving mice were performed according to 
protocols that were in compliance with ethical regulations and approved by the 
Washington University Animal Studies Committee. No inclusion or exclusion cri- 
teria were established; all animals studied were included in our analyses. 
Gnotobiotic mouse husbandry. Germ-free 5-week-old male C57BL/6J mice (Mus 
musculus) were maintained on a strict 12 h light cycle (lights on at 6:00) in flexible 
plastic gnotobiotic isolators (Class Biologically Clean). Mice were weaned onto an 
autoclaved, standard mouse chow diet low in fat and rich in plant polysaccharides 
(B&K Universal; diet 7378000). Two days before introduction of human donor 
faecal samples by gavage, 5-week-old animals were switched to the human infant 
formula diet. 

Diets. The infant formula diet consisted of a mixture of Similac ‘Sensitive with Iron’ 
infant formula and unflavoured whey protein powder (GNC) mixed at a ratio of 
11:1 (w/w). This powdered diet was reconstituted in the gnotobiotic isolator on a 
daily basis with sterile water. The infant formula plus fruit and vegetable diet was 
based on a survey of the fruits and vegetables most commonly consumed by infants 
transitioning to complementary foods”, and consisted of isocaloric amounts of 
the powdered infant formula diet and a mixture of 1:1:1:1 ratio (by mass) of sweet 
potatoes, green beans, bananas, and apples (Gerber 1st Foods). Formula was irra- 
diated as a powder. Fruits and vegetables were irradiated in their original plastic 
containers before the start of the experiment (25-30 Gy; Steris Isomedix) and 
mixed with the irradiated formula powder. When mice were consuming infant 
formula diet alone, fresh food was prepared daily within the gnotobiotic isolator 
and presented to animals in sterile plastic trays that were changed daily. When 
animals were given the mixture of formula and fruits/vegetables, food was prepared 
every other day, and new aliquots given to animals in fresh trays daily. Bedding was 
changed with each phase of the diet oscillation; within a given diet phase, bedding 
was changed every 2-3 days. 

Microbiota transplants. A given pulverized frozen human faecal sample 
(353 + 184mg; mean +s.d.) was transferred to an anaerobic Coy chamber (atmos- 
phere 75% No, 20% CO2, 5% Hz) in a 2 ml Axygen screw topped tube. The tube 
was then opened and its contents were transferred to a 50 ml conical shaped poly- 
propylene tube (Falcon). The faecal material was suspended in 10 ml of sterile 
PBS supplemented with 0.1% L-cysteine (Sigma) by vortexing with sterile glass 
beads (2 mm in diameter). The suspension was passed through a nylon 100,1m 
mesh filter (BD) and the filtrate was mixed with an equal volume of 30% glycerol 
in PBS/0.1% cysteine. Aliquots (1.2 ml) of this suspension were placed amber 
glass vials, each of which was sealed with a crimp top, and frozen at —80°C. Tubes 
were thawed, and transferred into gnotobiotic isolators (with surface sterilization 
achieved by treatment with Clidox). Aliquots (20011) were then introduced into 
each germ-free mouse in a given experimental group by oral gavage. A total of 
38 animals were used for this study (n= 4 or 5 mice per donor microbiota). This 
size of each treatment group was not based on a formal power calculation but was 
informed by our previous work described in ref. 8. There was no randomization of 
mice for this study; male C57BL/6] animals in each group were age- and weight- 
matched before gavage. Investigators were not blinded to the donor microbiota. 


BugFACS of mouse faecal samples. The protocol used was similar to that 
described above for human faecal samples with several modifications. Faecal pel- 
lets were resuspended in PBS, vortexed, and a volume equivalent to 5 mg of faecal 
material was passed through a nylon 70\1m mesh filter. After washing with PBS, 
cells were incubated for 30 min on ice in the dark with a polyclonal goat antibody 
directed against mouse IgA conjugated to DyLight 650 (Abcam; catalogue number 
ab97014; diluted 1/50 in PBS/0.5% (w/v) bovine serum albumin). On each day that 
BugFACS was performed, a positive control of pooled material from all mouse 
faecal samples analysed on that day and stained with anti-mouse IgA antibody was 
used to verify staining, while a negative control of the same pooled faecal material 
stained with the anti-human IgA antibody (conjugated to DyLight 650, see above) 
was used as an isotype control. 

Statistics. Statistical analyses, RF modelling, generation of plots, OTU con- 
solidation, and OTU table rarefaction were performed in the R programming 
environment (R version 3.1.1) or Prism 6.0. For presentations of data in which 
group means are compared, confidence in mean values is displayed as the s.e.m. 
Mann-Whitney U-tests and Student's t-tests were all two-tailed. FDR correction of 
P values was performed with the Benjamini-Hochberg procedure. Indicator species 
analysis was performed with the ‘indicspecies’ package in R**, PERMANOVA tests 
were performed with the ‘vegan’ package in R”’. For PERMANOVA of IgA indices, 
the matrix of Pearson's product moment correlation coefficients was converted to a 
dissimilarity matrix with the formula X ¢gissimilarity = (1 — Xsimilarity)/2, where X repre- 
sents a given sample-to-sample comparison. We performed two separate analyses: 
in the first, only the effects of zygosity, delivery mode, age bin, and twin pair were 
considered; a second analysis was performed on the subset of samples for which 
feeding data were available to evaluate the effects of milk feeding practices, with age 
bin and twin pair included as covariates. In both cases, 999 permutations were per- 
formed. Linear mixed-effects modelling was performed with the ‘ImerTest’ package 
in R®. Sex, delivery mode, zygosity, and feeding predominance were tested as fixed 
effects, with age bin, twin pair, and the infant/child study identifier treated as nested 
random effects. Similar results were obtained with either the Satterthwaite or the 
Kenward-Roger approximation for denominator degrees of freedom. 
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Extended Data Figure 1 | Method used for OTU consolidation. a, OTU example, three clusters composed of OTUs with Spearman's correlation 


consolidation was performed to limit pseudo-duplication of taxa. ‘Counts’ —_ coefficients of >0.7 are identified and the abundances of their constituent 
on the y axis refer to the number of OTU-OTU correlations falling OTUs are summed. Each OTU cluster is assigned an identifier number 
within a given range of Spearman’s correlation values shown on the x axis with the prefix ‘C’ and given a consensus taxonomic assignment (see 

(n= 341,640 OTU-OTU comparisons; see Methods for details). b, A Supplementary Table 5). Note that the OTUs used to generate a given 
subset of the matrix used to derive the distribution shown in a illustrates ‘consolidated OTU’ shared 99.3 + 0.4% (mean + s.d.) nucleotide sequence 
how OTUs within a single family-level taxon are consolidated. In this identity in their V4-16S rRNA nucleotide sequences. 
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Extended Data Figure 2 | Modelling development of the gut microbiota 
during the first 24 months of life in healthy twins. a, To estimate the 
number of OTUs needed to maximize predictive accuracy, OTUs were 
iteratively added to a series of RF models, starting with the OTU with the 
highest feature importance score and adding additional OTUs in order 

of decreasing feature importance. To evaluate performance of the model, 
members of the 40 twin cohort were randomly assigned to ‘training, 
‘co-twin of training’ and ‘test’ sets (red, green, and blue, respectively) ten 
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point, mean + s.e.m. values are plotted). The dashed vertical line 

indicates performance of a 25 OTU model across the three different sets. 
b, Predicted age was calculated for all faecal microbiota samples with 

a sparse 25 OTU RF-generated model. Chronological versus model- 
predicted age is plotted for each of the three data subsets (m= 1,477 faecal 
samples). The inset shows mean + s.d. values for predicted microbiota 

age of samples in each monthly age bin. c, Heatmap of mean abundances 
over the first 24 months of life for the 25 OTUs used to generate the sparse 
model. Taxa are normalized by row, with hierarchical clustering (complete 
linkage; n = 1,477 faecal samples). 
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Extended Data Figure 3 | Similarity of faecal microbiota composition 
within and between twin pairs. Similarity in composition of the faecal 
microbiota within and between twin pairs was analysed with unweighted 
UniFrac distance calculated before OTU consolidation. Statistical 
significance was evaluated with the paired Wilcoxon test for twin-twin 
versus twin-unrelated comparisons. Mean values + s.e.m. are plotted 
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(n= 205 paired comparisons). ***P < 0.001. The results indicate that the 
overall phylogenetic composition of the faecal microbiota is more similar 
in infants/children sharing a common living environment and genetic 
background than between unrelated individuals; this is apparent as early 
as the first month of life and does not change significantly over the ensuing 
23 months. 
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Extended Data Figure 4 | Feeding status and microbiota composition 
during the first year of life. a, The proportion of all feedings on the day 

of faecal sampling that consisted of either formula or breast milk (n = 746 
observations). b, Microbiota age, defined by the sparse RF-derived model, 
compared for participants that were predominantly breast fed (>50% 

of milk feeding) or predominantly formula fed across different age bins. 
Mean values + s.e.m. are plotted (n = 681 faecal samples). *P < 0.05; 

**P < 0.01; ***P < 0.001 (Mann-Whitney U-test). c, Aggregate percentage 


8 9 10 11 12 


relative abundance of age-discriminatory bifidobacteria included in the 
sparse RF-derived model; differences in their representation in the faecal 
microbiota as a function of breast or formula feeding evaluated in each age 
bin are shown. Horizontal lines within each column represent the median 
values; the horizontal dashed line represents the lower limit of detection. 
*P < 0.05; **P < 0.01; ***P < 0.001 (Mann-Whitney U-test comparing 
samples obtained from breast versus formula fed individuals; n = 681 
faecal samples). 
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Extended Data Figure 5 | Further characterization of IgA responses to 
members of the microbiota in the USA twin cohort. a. Evidence that IgA 
indices are independent of relative abundance. IgA indices for all OTUs 
are plotted against their relative abundance in the ‘input fractiow for all 
infant/child, maternal, and gnotobiotic mouse faecal samples analysed 
by BugFACS (n= 22,713 comparisons). b, Mean IgA indices + s.e.m. for 
two OTUs whose IgA targeting varied significantly with age across all 

80 individuals in the twin birth cohort (FDR-corrected Kruskal-Wallis 
test, P< 0.05). c, Variance of IgA indices as a function of age. A total of 
26 OTUs were detected in at least two individuals in all of the age bins 
surveyed. The variance in their IgA indices was then calculated and the 
non-parametric repeated-measures Friedman test was used to test for 
statistical significance (P < 0.0001). Mean values + s.e.m. are plotted. 
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Extended Data Figure 6 | Specificity of targeting and temporal variation 
in the prevalence of IgA-targeted or non-targeted taxa. Specificity values 
from the indicator species analysis were calculated across all time points 
for the 30 OTUs identified as consistently IgA-targeted or non-targeted 

in Fig. la. Prevalence of the taxa, defined as detection in either the IgAT 

or IgA~ fraction, was plotted against the percentage of samples in which 

a given taxon had a positive or negative IgA index (n= 4,186 IgA index 
values analysed). The results reveal a group of OTUs that increased in 
prevalence over the course of the first 2 years of postnatal life and had 

very high ‘specificity’ for either the IgA* or IgA fraction (that is, across 
the population of faecal samples, most 16S rRNA reads for a given OTU 
were detected in one of the two fractions). This group included R. torques 
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A second group of OTUs became more prevalent with age but members 
had a much weaker, albeit statistically significant, association with one or 
the other sorted fraction (for example, R. gnavus OTU C.4 and B. vulgatus 
OTU C.15). A third group of OTUs were highly specific for a given 

sorted fraction but were only detected in a minority (<20%) of children. 
This last group included two strongly IgA-targeted OTUs assigned to 

A. muciniphila (OTU 588471 and OTU 4306262). Intriguingly, these two 
OTUs co-occurred just once among the 176 BugFACS samples in which 
A. muciniphila was detected (P < 0.0001, y” test). 
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Extended Data Figure 7 | Effects of diet on gut mucosal IgA responses 
to members of the microbiota. The analysis was constrained to those 
faecal samples where a diet history had been collected within 10 days 

of procuring the specimens (n = 276). After FDR correction with the 
Benjamini—Hochberg procedure, IgA targeting of 2 of the 30 taxa 
identified in Fig. 1a varied significantly as a function of breast versus 
formula feeding. Each circle represents results from a given faecal sample. 
Samples are colour-coded on the basis of the type of milk diet being 
consumed by the donor at the time of faecal sampling. Horizontal lines in 
each column represent mean values. *P < 0.05 (Mann-Whitney U-test of 
the differences between breast and formula fed). 
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Extended Data Figure 8 | Diet-dependent changes in composition of 
the faecal microbiota of gnotobiotic mice. Indicator species analysis 
was used to identify taxa from the RF-derived model of gut microbiota 
maturation whose abundances varied consistently by diet treatments 
(n=9,999 permutations with ‘mouse’ as the grouping variable). The top 
60 ranked OTUs in the model (on the basis of their feature importance 
scores) were included in the analysis; those OTUs with statistically 
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significant diet-dependent partitioning (P< 0.05) after FDR correction 
and with an indicator value >0.5 are shown, ranked from highest to lowest 
indicator value for infant formula-discriminatory (upper portion of figure) 
and ‘infant formula plus fruits and vegetables’-discriminatory (lower 
portion of figure) (see Supplementary Table 19 for results of the indicator 
species analysis). Mean values for relative abundances in the faecal 
microbiota at each time point are plotted + s.d. 
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The Brazilian Zika virus strain causes birth defects 


in experimental models 


Fernanda R. Cugola!*, Isabella R. Fernandes!**, Fabiele B. Russo!**, Beatriz C. Freitas’, Jodo L. M. Dias!, Katia R Guimaraes!, 
Cecilia Benazzato!, Nathalia Almeida!, Graciela C. Pignatari!*, Sarah Romero’, Carolina M. Polonio*, Isabela Cunha’, 

Carla L. Freitas*, Wesley N. Branddo*, Cristiano Rossato*, David G. Andrade’, Daniele de P. Faria®, Alexandre T)Garcez?, 
Carlos A. Buchpigel®, Carla T. Braconi®, Erica Mendes®, Amadou A. Sall’, Paolo M. de A. Zanotto®, Jean Pierre S. Peron’, 


Alysson R. Muotri* & Patricia C. B. Beltrao-Braga!® 


Zika virus (ZIKYV) is an arbovirus belonging to the genus Flavivirus 
(family Flaviviridae) and was first described in 1947 in Uganda 
following blood analyses of sentinel Rhesus monkeys’. Until the 
twentieth century, the African and Asian lineages of the virus did not 
cause meaningful infections in humans. However, in 2007, vectored 
by Aedes aegypti mosquitoes, ZIKV caused the first noteworthy 
epidemic on the Yap Island in Micronesia”. Patients experienced 
fever, skin rash, arthralgia and conjunctivitis”. From 2013 to 2015, 
the Asian lineage of the virus caused further massive outbreaks in 
New Caledonia and French Polynesia. In 2013, ZIKV reached Brazil, 
later spreading to other countries in South and Central America’. 
In Brazil, the virus has been linked to congenital malformations, 
including microcephaly and other severe neurological diseases, such 
as Guillain-Barré syndrome*”. Despite clinical evidence, direct 
experimental proof showing that the Brazilian ZIKV (ZIKV®®) 
strain causes birth defects remains absent®. Here we demonstrate 
that ZIKV®® infects fetuses, causing intrauterine growth restriction, 
including signs of microcephaly, in mice. Moreover, the virus infects 
human cortical progenitor cells, leading to an.increase in cell 
death. We also report that the infection of human brain organoids 
results in a reduction of proliferative zones and disrupted cortical 
layers. These results indicate that ZIKV®® crosses the placenta 
and causes microcephaly by targeting cortical progenitor cells, 
inducing cell death by apoptosis and autophagy, and impairing 
neurodevelopment. Our data reinforce the growing body of 
evidence linking the ZIKV®® outbreak to the alarming number of 
cases of congenital brain malformations. Our model can be used to 
determine the efficiency of therapeutic approaches to counteracting 
the harmful impact of ZIKV®® in human neurodevelopment. 

The recent increase immicrocephaly cases in Brazil has been asso- 
ciated with the outbreak of Zika virus (ZIKV)’, originating from an 
Asian-lineage strain that can be spread by Aedes aegypti mosquitoes®. 
The Brazilian ZIKV (ZIKV®®) has been detected in the placenta and 
amniotic fluid. of twowomen with microcephalic fetuses”"'! and in 
the blood of microcephalic newborns!”"’, suggesting that the virus 
can cross the placental membrane. The virus has also been identified 
in the brains and retinas of microcephalic fetuses!!~!3. However, there 
is no direct evidence of the mechanism by which ZIKV®® causes brain 
malformations. A previous study revealed that the African ZIKV 
(ZIKV45, strain MR-766) has the ability to infect human skin cells". 
Neurons and astrocytes in the mouse brain could also be infected, 
inducing hippocampal degeneration and necrosis of pyriform cells 


7 days post-infection!». More recently, ZIKV“ was also shown to infect 
human pluripotent stem cell (hPSC)-derived neural progenitor cells 
(NPCs) in vitro, which induced apoptotic cell death'®. These studies 
were performed using the MR-766 ZIKV™ strain isolated in Uganda in 
1947, which shares 87-90% sequence similarity with the Polynesian and 
Brazilian isolates*!”. Nevertheless, because severe congenital malforma- 
tions were not reported for African isolates, there is a need to study the 
association of ZIKV with microcephaly and birth defects with isolates 
from affected localities, such as the ZIKV®® strain. Therefore, there is 
an urgent need to develop model systems to determine the relationship 
between infection with the ZIKV™® strain and birth defects. 

We used ZIKV®® isolated from a febrile case in the state of Paraiba, 
in the northeast of Brazil in 2015 (see Methods). To evaluate the causal 
relationship between ZIKV®R and birth defects, including brain mal- 
formation during development, we first used a murine experimental 
model in which SJL and C57BL/6 pregnant mice were infected with 
ZIKV®, evaluating newborns immediately after birth (Extended Data 
Fig. 1a). Notably, similar to ZIKV®® infected human newborns!®!?, 
pups born from the SJL ZIKV®®-infected pregnant females displayed 
clear evidence of whole-body growth delay or intra-uterine growth 
restriction (IUGR)? compared to pups born from the mock-infected 
controls (Fig. la, b). Using a qPCR assay, we confirmed the presence of 
ZIKV®® genomic RNA in several tissues of newborn animals, observing 
significantly more viral RNA in the brain, confirming the neurotropic 
nature of the virus (Fig. 1c). 

Microcephaly is perhaps the most dramatic of the birth defects 
reported in ZIKV®®-infected newborns*!*!°, Mouse models often fail 
to reproduce the severely reduced brain size and pathological alter- 
ations found in human patients*’”’, probably owing to significant 
differences in gestation time and brain development between the two 
species. Nevertheless, upon close inspection of the ZIKV®®- infected 
mice brains, we noticed cortical malformations in the surviving 
animals, with reduced cell number and cortical layer thickness, signs 
associated with microcephaly in humans (Fig. 1d-f). Ata cellular level, 
the neurons in the cortex, thalamus and hypothalamus displayed a 
‘vacuolar nuclei’ appearance. This morphology was characterized by 
central emptiness and marginalized chromatin pattern with nuclear 
debris, suggesting ongoing cellular death (Fig. 1 d and Extended Data 
Fig. 2). In addition, we also noticed apparent ocular abnormalities, 
reminiscent of that observed in human patients” (Fig. 1g). Thus, SJL 
infected pups presented congenital malformations compatible with 
ZIKV®®-infected human newborns. While the impact of ZIKV?® 
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Figure 1 | ZIKV®® infection in SJL mice. a, SJL pups born with 

IUGR. Scale bar, 1 cm. b, Total body weight, crown-rump and skull 
measurements in pups born from infected animals (n =6 pups, comprising 
3 mice from 2 separate litters; error bars, s.e.m.; t-test, **P< 0.01). 

c, ZIKV®® RNA detected in SJL pup tissues (1 =6 pups, comprising 3 mice 
from 2 separate litters; error bars, s.e.m.; t-test). d, Histopathological 
aspect of the cortical organization (brackets) in infected brains, including 
intranuclear vacuoles, and ‘empty’ nuclei aspect with chromatin 
margination in neurons (arrowheads). Scale bar, 100 1m (left panels), 
504m (middle panels) and 10\.m (right panels). e, ZIKV?®-infected brains 


infection in the SJL mice,was notable, no major body alterations were 
detected in pups from the infected C57BL/6 animals (Extended Data 
Fig. la—-c). To exclude potential minor alterations in the C57BL/6 
mice, computed tomography (CT) scans were performed to quantify 
the skull/body volumes. No significant changes in the pups born from 
the ZIKV?®:infected C57BL/6 mothers were observed compared to 
the controls (Extended:Data Fig. 1d, e). A diagnostic qPCR assay of 
the pups from the ZIKV®® infected animals was negative, suggesting 
that the virus did not cross the placenta in the C57BL/6 mouse strain 
(Extended Data Fig. 1f). To elucidate the type of cell death induced by 
ZIKV® in the brains of SJL pups, we used a qPCR array to distinguish 
different molecular pathways involved. Our data clearly indicate that 
ZIKV®® infection influence the regulation of genes intimately linked to 
autophagy and apoptosis, such as upregulation of Bmf, Irgm1, Bcl2, Htt, 
Casp6 and Abl1. Conversely, Gadd45a, Tnfrsf11b, Fasl, Atg12, Bcl2I11 
and Dffa were highly suppressed (Fig. 1h and Extended Data Fig. 1g, h). 

Next, we evaluated the impact of ZIKV®® infection in human neural 
cells derived from hPSCs to establish a correlation between ZIKV and 
impairment of neurogenesis (Extended Data Fig. 3a). We generated 
human cortical NPCs and neurons from healthy donor hPSCs. First, 
we determined the expression levels of the TYRO3, AXL and MERTK 
(TAM) receptors tyrosine kinases in NPCs and neurons. This is an 
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displayed a reduced cortical layer thickness (n = 6 pups, comprising 3 mice 
from 2 separate litters; error bars, s.e.m.; t-test, ***P < 0.001). Infected 
brains have fewer cells per layer (n =6 pups, comprising 3 mice from 

2 separate litters; error bars, s.e.m.; t-test, **P<0.1). f, ZIKV®®- infected 
cortical neurons have pronounced nuclei (diameter) (cortical, n = 31; deep 
cortical, n= 21 and medulla, n = 41 nuclei; error bars, s.e.m.; two-way 
ANOVA, ****P < 0.001). g, Ocular malformations (arrow) in the ZIKV®® 
infected pups. h, Cell death gene expression signature in the brains of 
ZIKV®®-infected pups (1 =2 mice per group; threshold = twofold). 


important family of receptors used for cell invasion by the Dengue 
virus and ZIKV, and AXL has been recently proposed as a candidate 
receptor for ZIKV infection during neurogenesis'***, Mock-infected 
NPCs expressed higher levels of AXL when compared to mock-infected 
neurons (Fig. 2a). However, no significant changes in expression lev- 
els were observed upon ZIKV infection in NPCs (Fig. 2b). We then 
investigated the impact of ZIKV®® and ZIKV" infection in NPCs and 
neurons. After infection using a viral multiplicity of infection (MOI) 
of 10, ZIKV®® particles were detected inside the NPCs and neurons at 
several stages of viral assembly using transmission electron microscopy 
(Fig. 2c). Immunostaining performed on NPCs and neurons at both 
an MOI of 10 and an MOI of 1 revealed production of viral protein 
aggregates (Fig. 2d and Extended Data Fig. 3b, c). With an MOI of 10, 
the amount of ZIKV®® particles in the NPC and neuron culture super- 
natant increased over time, suggesting the efficient production of 
infectious viral particles (Fig. 2e, f). With an MOI of 1, NPCs, but not 
neurons, continued to produce ZIKV®® RNA in the culture superna- 
tant (Extended Data Fig. 3d, e). After 96 h post-infection we observed 
significant cell death in NPC cultures using fluorescence-activated cell 
(FAC) analyses. We quantified cell death over time in NPC cultures 
and detected an increase in the number of apoptotic/necrotic cells 
both in the ZIKV®®- and ZIKV“"-treated cultures compared to the 
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Figure 2 | ZIKV infection in vitro. a, Relative expression of TAM 
receptors (n =2 technical replicates from two pooled different donors; 
error bars, s.e.m.; t-test; ***P < 0.01). b, Expression of TAM receptors 
in NPCs after ZIKV®® infection (MOI = 10) at 48h post-infection 
(n= 2 technical replicates from two pooled different donors; error 
bars, s.e.m.; one-way ANOVA, P > 0.05). c, TEM detection of ZIKV®® 
viral particles 24h post-infection at MOI= 10 (red arrowheads) inside 
NPCs (top panels) and neurons (bottom panels). Yellow arrowheads, 
viral factories; white arrowheads, immature viral particles. Scale bars/ 
magnifications, 0.5 41m/40,000x (top left); 200 nm/ 80,000 (top right); 
0.2um/ 50,000 x (bottom left); 50nm/ 300,000 x (bottom right). 

d, Immunofluorescence revealed susceptibility to infection in NPC 
and neurons with the ZIKV28 (MOI =10) at 24h post-infection. Scale 
bar, 251m. e, ZIKVPR replication dynamics in NPCs (MOI= 10; n=2 


MOI 10; 96 h 
Neurospheres MOI 10 


mock-infected cultures at an MOI of 10 (Fig. 2g), but not at an MOI of 
1 during the same timeframe (Extended Data Fig. 3f). No difference 
in neuronal cell death was observed between the two ZIKV strains at 
an MOI of 10 and an MOI of 1 (Extended Data Fig. 3g, h). 

Next, we challenged two three-dimensional neural cell culture sys- 
tems, neurospheres and cerebral organoids, with ZIKV® and ZIKV“F. 
We generated neurospheres by growing human NPCs in suspension. 
While the mock-infected control neurospheres continued to grow over 
time, the ZIKV®® infected neurospheres (MOI of 10) displayed evident 
morphological abnormalities with signs of cell death (Fig. 2h). The sizes 
of the neurospheres infected with ZIKV®® were significantly smaller 
than the mock-control and ZIKV“!-infected at 96h post-infection 
(Fig. 2i). A less dramatic effect is observed at an MOI of 1, where both 
ZIKV®8 and ZIKV“F infection reduced the size of the neurospheres 
compared to mock-infected controls (Extended Data Fig. 4a, b). These 
observations were paired with increased ZIKV®® replication in these 
cultures at both MOIs of 10 and 1 (Fig. 2) and Extended Data Fig. 4c). 
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technical replicates from RNA of two different donors). f, ZIKVBR 
replication dynamics in neurons (MOI = 10; n = 2 technical replicates 
from RNA of two different donors). g, NPC death measured by FACS 
with two different cell gating sizes (P1 and P2). Apoptosis (left panel), 
necrosis (middle panel) and late apoptosis (right panel) (MOI = 10; 

48 h post-infection; n =2 technical replicates from two different donors; 
error bars, s.e.m.; two-way ANOVA, *P < 0.05). PI, propidium iodide. 
h, Representative images of human neurospheres infected with ZIKV"® 
(MOI= 10; 96h post-infection). Scale bar, 200 1m. i, Alterations in 
neurosphere diameter over time (MOI= 10; n= 25 neurospheres from 
two different donors for each time point; error bars, s.e.m.; ANOVA, 
** P< 0.0001). j, ZIKV replication dynamics in neurospheres 

(MOI = 10; n=2 technical replicates from two different donors). 


These results suggest that ZIKV®® induces cell death in human NPCs, 
impairing the growth and morphogenesis of healthy neurospheres 
(Extended Data Fig. 4d-f). 

The majority of the described cases of ZIKV®*-infected new- 
borns (95%) had malformations of cortical developmentt*. 
Thus, we also used brain organoids generated from hPSCs and 
human embryonic stem cells, to evaluate the impact of ZIKV®R 
on human cortical development. In the following experiments, 
alongside the ZIKV“F and mock infection, we added the Yellow 
Fever virus (YFV), a slow replicating attenuated live-vaccine 
Flavivirus that has a low risk of causing neuropathy. Cerebral orga- 
noids are three-dimensional, self-organized, stem-cell-derived 
models that recapitulate the first trimester of human neurodevelop- 
ment, including the molecular and cellular architecture reminiscent 
of the fetal cortex’°. Organoids show some degree of lamination and 
resembled the human neocortex in terms of the spatial relationships of 
the progenitor populations, defined here as a proliferative ventricular 
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Figure 3 | Cortical alterations in human brain organoids infected with 
ZIKV. a, Representative image of a human cerebral organoid showing 
the marginal zone (MZ), cortical plate (CP) and ventricular zone (VZ), 
delineated by dotted white lines. Scale bar, 200 1m. b, Representative 
images of the CP stained for CTIP2, TBR1, MAP2 or TUJ1 (neurons). 
Scale bar, 501m. c, Representative images of the proliferative regions in 
the VZ stained for KI67, PAX6, and cleaved caspase-3 (CASP3). Scale 
bar, 50m. d, Percentage of TBR1-positive cells in relation to mock- 
infected controls (dotted line) (MOI =0.1; n =3 replicates from three 
human cell lines; error bars, s.e.m.; ANOVA, **P =0.0025). e, Percentage 


zone, post-migratory neurons in cortical plate and a marginal zone 
(Fig. 3a—c). We infected organoids with ZIKV®8, ZIKV“F and YEV 
using an MOI of 0.1 and compared to mock-infected organoids at 
24 and 96h post-infection. We quantified the percentage of different 
subtypes of cortical neurons, TBR1-positive or CTIP2-positive cells 
(deep-layer V/VI) in the cortical plate, finding a significant reduc- 
tion in their number and respective cortical plate thickness in the 
ZIKV®8- infected organoids compared to the others. A significant 
reduction in TBR1-positive cells was observed in the ZIKV?® infected 
organoids at 96 h post-infection, while CTIP2-positive cells were signif- 
icantly reduced in both ZIKV®®. and ZIKV“"-infected organoids at the 
same time point (Fig. 3d, e and Extended Data Fig. 5a-f). Consistent 
with the reduced population of cortical neurons, we observed a sig- 
nificant decrease in PAX6-positive cells (dorsal forebrain progenitor 
cells) following ZIKV infection (Fig. 3f and Extended Data Fig. 5d). 
Dividing cells in the ventricular zone, detected by the population of 
KI67- and SOX2-positive cells, were only significantly reduced in the 
ZIKV®® infected organoids (Fig. 3g, h and Extended Data Fig. 5d). As 
observed in our other in vitro models, the number of apoptotic cells 
(cleaved caspase 3- and TUNEL-positive cells) was increased in orga- 
noids infected with ZIKV®%, possibly explaining the decrease in the 
NPC population (Fig. 3iand Extended Data Fig. 5g, h). 

ZIKV*“F was derived from a zoonotic agent associated with 
primates in Africa, whereas ZIKV®® is an isolate from a lineage adapted 
to human-to-human transmission for the past 70 years. As an entry 
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of CTIP2-positive cells (MOI=0.1; n=3 replicates from three human 

cell lines; error bars, s.e.m.; ANOVA, ****P < 0.001; *P=0.0430). 

f, Percentage of PAX6-positive cells (MOI=0.1; n =3 replicates from three 
human cell lines; error bars, s.e.m.; ANOVA, *P =0.0221). g, Percentage 
of KI67-positive cells (MOI=0.1; n=3 replicates from three human cell 
lines; error bars, s.e.m.; ANOVA, ****P < 0.001). h, Percentage of SOX2- 
positive cells (MOI = 0.1; n =3 replicates from three human cell lines; 
error bars, s.e.m.; ANOVA, ****P = 0.0003). i, Percentage of cleaved- 
caspase3-positive cells (CASP3) (MOI=0.1; n =3 replicates from three 
human cell lines; error bars, s.e.m.; ANOVA, ****P = 0.0002). 


point to establishing the potential mechanistic adaptive differences 
between ZIKV®8 and ZIKV“F towards human cells, we also generated 
brain organoids from non-human primate (chimpanzee) pluripotent 
stem cells. We infected these chimpanzee’s cerebral organoids (MOI 
of 0.1) and measured the impact on cortical neurons at 24 and 96 h 
post-infection. ZIKV® failed to reduce the percentage of either TBR1- 
or CTIP2-positive cells in non-human primates (Extended Data 
Fig. 5i, j). Consistently, the kinetics of infection were different between 
the two ZIKV isolates. While ZIKV®® did not replicate in the chimpanzee 
organoids, ZIKV“F seemed well adapted to these cells (Extended Data 
Fig. 5k). 

To evaluate the causal relationship between ZIKV congenital infection 
and birth defects, we used a murine experimental model, in which 
pregnant SJL and C57BL/6 mice were infected with ZIKV®®. Notably, 
the SJL strain was susceptible to viral infection of fetal tissues, causing 
severe IUGR that resembled the affected Brazilian newborns, including 
signs of microcephaly, such as cortical malformations*. We also showed 
that ZIKV®® induced apoptosis and autophagy in the mouse neural 
tissue. This is in accordance with the literature, as it has been previ- 
ously demonstrated that ZIKV induces and localizes in autophagic 
phagosomes". To our knowledge, this is the first report showing a 
gene expression profile that correlates to cell death in the brains of 
microcephalic newborn ZIKV®- infected mice, corroborating a causal 
relationship. It is unclear why the virus could not cross the placenta 
of C57BL/6 mice, but this result may be due to the robust anti-viral 
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immune response of this mouse strain, which secretes significant 
levels of type I/II interferon, known to confer resistance to ZIKV'*”*. 
These data suggest that genetic differences could explain in part why 
some ZIKV-infected pregnant women give birth to newborns without 
detectable congenital brain malformations’’. Nonetheless, our murine 
model is a valuable tool for future pre-clinical studies, such as vaccine 
development. The presence of major cortical histological abnormal- 
ities in the pups with IUGR prompted us to use an hPSC model to 
study the impact of ZIKV on neurodevelopment. ZIKV infects cells 
at different stages of brain maturation leading to alterations in the 
cortical layer organization. While this manuscript was under review, 
two other papers revealed the impact of previously established ZIKV 
strains on human organoids, confirming our observations with ZIKV®® 
(refs 28,29). Finally, our data using a non-human primate organoids 
suggested that the ZIKV®® might have experienced adaptive changes in 
human cells. In fact, it has been demonstrated that the Asian lineage of 
ZIKV is undergoing codon usage adaptation towards biases observed in 
highly expressed human genes*”. Our findings support the hypothesis 
that microcephaly is a distinctive feature of recent ZIKV Asian-lineage 
virus, which originated in the Pacific and is now spreading in South 
and Central America. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment 

Viral culture and amplification. A lyophilized ZIKV isolate from a clinical case 
in Brazil (ZIKV®%), provided by the Evandro Chagas Institute in Belém, Para, 
was reconstituted in 0.5 ml of sterile DEPC water. The African-lineage MR-766 
(ZIKV“*), a reference strain isolated in Uganda in 1947 and the Yellow Fever 
vaccine strain (YFV-17D)*!, both used here as controls, were provided by the 
Institute Pasteur in Dakar, Senegal. Aedes albopictus mosquito cells (C6/36 cells) 
were previously prepared to culture the three viruses. C6/36 cell culture was 
maintained using Leibovitz’s L-15 medium supplemented with 10% fetal bovine 
serum (FBS) (Gibco), 1% non-essential amino acids (Gibco), 1% sodium pyruvate 
(Gibco), 1% penicillin/streptomycin (Gibco), 0.05% of amphotericin B (Gibco) 
and kept at 27°C in the absence of CO. After reaching an approximately 70% 
confluent monolayer, 50 il of each viral sample was inoculated into C6/36 with an 
hour of adsorption, with gentle shaking every 10 min to allow the homogeneous 
adsorption of the viruses. At the end of the adsorption period, 5 ml of the culture 
media was added, plus 2% FBS, 1% non-essential amino acids and 1% sodium 
pyruvate. The cultures were then incubated under the same adsorption conditions. 
In the first subculture (T1), the infected cells were less confluent compared to the 
control cells but had few noticeable morphological changes. On the fourth day 
after infection, the second subculture (T2) was made blindly by transferring 500 il 
of the T1 supernatant, followed by the third subculture, which was collected on 
the eighth day after infection (T3). Pronounced cytopathic effects were perceived 
beginning at T2. The supernatants were harvested, titrated and T3 was used for 
the experimental inoculation. 

Virus titration. Titration (in PFU ml” ') of each C6/36 subculture was obtained by 
plaque assay to determine the amount of infectious viral particles (PFU). The virus 
titration was performed in porcine kidney epithelial (PS) cells and in L15 medium 
with 5% FBS. Briefly, the virus titration was done using 20011 of L15 medium 
(5% FBS, 1% penicillin/streptomycin, and 1% glutamine) in a 24-well plate. Then, 
a serial dilution of each virus stock from ZIKV®8, ZIKV“F and YFV-17D in L15 
medium was performed, from 107! to 107-1. Then, 20011 of each dilution was 
added in each 24-well plate. After this, 1 x 10° PS cells were seeded in each well of 
a 24-well plate for at least 3 h at 37 °C to allow virus adsorption and PS cell adher- 
ence. Later, each well was overlaid with complete carboxymethy] cellulose (CMC) 
medium (0.6% in L15 supplemented with 3% FBS). After 5 days of incubation 
at 37°C, the plaque visualization was made using blue-black staining solution. 
The most appropriate viral dilution was estimated to determine the amount of 
infected cells visible (PEU ml“!). For ZIKV®, the first C6/36 subculture had a titre 
of 6 x 10°. The following subcultures had titres of 7.5 x 10°(T2) and.4 x 10! (T3). 
All the subculture aliquots were stored in cryovials and maintained in liquid nitrogen 
or were distributed to the ZIKV Sao Paulo task force. 

In vivo infection. Pregnant mice, 6-8 weeks of age, C57BL/6 or SJL (JAX), were 
infected intravenously with 200 il of ZIKV"*-infected C6/36 cell supernatant 
containing 1 x 10°, 4 x 10!° or 1 x 10!* PFU m ! of virus on day 10-13 of gesta- 
tion. The animals were observed daily. All the experiments were performed with 
the approval of the Institute of Biomedical Sciences Ethics Committee protocol 
number 05/2016. 

Real-time PCR. RNA was extracted fromeach sample (cells, supernatant of cell 
culture or mouse tissue) using the QlAamp UltraSens Virus Kit (Qiagen) or TRIzol 
reagent (Invitrogen). All RNA pellets were resuspended in 3011 of RNase-free 
distilled water, quantified using a NanoDrop spectrophotometer (NanoDrop 
Technologies) and stored at —80°C. The set of primers/probes specific for ZIKV 
were synthesized by Sigma Life Science, with 5- FAM as the reporter dye for the 
probe. The set of primers/probes ZIKV 835, ZIKV 911c and ZIKV 860-FAM 
were previously described’. The real-time reaction was performed with 1011 of 
each sample and 10,1 of the AgPath-IDTM One-Step RT-PCR reagents (Applied 
Biosystems). The amplification was done in an Applied Biosystems 7500 real-time 
PCR system, and involved activation at 45°C for 15 min, 95°C for 15 min followed 
by 40 amplification cycles of 95°C for 15s, 60°C for 15s, and 72°C for 30s. The 
real-time data were analysed using SDS software from Applied Biosystems. For the 
detection and quantification of viral RNA, the real-time PCR of each sample was 
compared with threshold cycle (C,) value with a ZIKV plasmid standard curve, 
which was obtained carrying out serial dilutions of a clone of the envelope gene of 
an isolate from the 2007 Yap Island outbreak (provided by the Institute Pasteur in 
Dakar, Senegal). For the detection and quantification of YFV RNA, a YFV-specific 
real-time assay was applied. The fold changes of gene expression were calculated in 
comparison to the values for the YFV controls*’. The positive PCR control C, value 
was used to normalize gene expression and determine fold changes during the 
96 hours post-infection. The RPLA27 gene (60S ribosomal protein L27) was used 
as endogenous control for the PCR reactions*’. For the TAM receptors detection, 


NPCs or neurons from two donors were or were not infected with ZIKV®®, at 
10 MOI for 48h and were submitted to standard TRIzol (Invitrogen) protocol for 
RNA extraction. Total RNA was quantified using a NanoDrop spectrophotometer 
and 21g was used for further cDNA synthesis using the superscript III reverse- 
transcription polimerase (Invitrogen) according to manufacturer's protocol. PCR 
was performed using Taqman probes (Extended Data Table 1) and the QuantStudio 
12K Flex Real-Time PCR System (Applied Biosystems). For normalization ACTB 
was used as a housekeeping gene. 

Cell death pathway analysis. One microgram of total RNA from brains of 4 pups, 
2 pooled mock and 2 pooled ZIKV®*-infected from SJL mothers were submitted 
to gene expression analysis for cell death target genes using the RT2 Profiler PCR 
Array Mouse Cell Death PathwayFinder (cat. no. PAMM-212ZA- Qiagen) accord- 
ing to the manufacturer’s protocols. qPCR was performed in the QuantStudio 12K 
Flex Real-Time PCR System (Applied Biosystems). To,evaluate gene expression, 
we established a fold change threshold of at least twofold up- or downregulation 
compared to mock infected samples. Statistical analysis was performed using the 
RT2 profiler RT-PCR array data analysis software v3.5. 

NPCs, neurons, neurospheres and organoids. We used three human and two 
chimpanzee iPSC clones that were previously characterized in the Beltrao-Braga 
and Muotri laboratories**** plus H9 human embryonic stem cells (hESC) for 
all the experiments using pluripotent stem cells. All the cell lines tested negative 
for mycoplasma contamination. Briefly, high passages of iPSC/hESC colonies 
on feeder-free plates were maintained for5.days with mT'SeR media (Stem Cell 
Technologies). On the fifth day, the medium was changed to N2 media (DMEM/ 
F12 medium supplemented with 1X N2;supplement (Invitrogen) and the dual 
SMAD inhibitors; 14M dorsomorphin (Tocris) and 1|1M $B431542 (Stemgent), 
for 48h. Further, the colonies were detached from the plate and cultured in sus- 
pension as embryoid bodies (EBs) for 5 days at 90 r.p.m. in N2 media with the 
dual SMAD inhibitors. The EBs were plated on matrigel-coated plates with NBF 
media composed of the following: DMEM/F12 media supplemented with 0.5X N2, 
0.5X B27 supplement (Gibco), 20 ng ml! of FGF2 and 1% penicillin/streptomycin. 
The emerged rosettes containing the NPCs were manually picked, dissociated and 
plated in a double-coated plate with poly-ornithine (101gml', Sigma-Aldrich) 
and laminin (2.5 1gml 1, Gibco). The NPC population was expanded using NBF 
media. The neuronal differentiation induction protocol consisted of treating the 
confluent NPC plate with 10}1M ROCK inhibitor for 48 h (Y-27632, Calbiochem) 
in theabsence of FGF in the media, with regular media changes every 3 or 4 days. 
Neurons were considered completely differentiated and ready to experiments after 
28 days. To produce neurospheres, NPC were scrapped from the plates and sub- 
mitted to continuous shaking for 5-7 days at 90 r.p.m. in NBF media. Cerebral 
organoids were generated as previously described”***. All experiments were per- 
formed with the approval of the Institute of Biomedical Sciences Ethics Committee 
protocol number 1001. 

In vitro infection. NPCs, neurons, neurospheres and organoids were infected with 
ZIKV®S, ZIKV“5, YFV and mock (culture supernatant from uninfected C6/36 
cells). NPCs were seeded in plates in 24-well plates and after 24h viral samples were 
diluted to the desired MOI (0.1; 1 or 10) and added to the cells. For viral adsorption, 
cells in monolayer were incubated for 1h at 4°C with gentle agitation every 10 min. 
Next, the inoculum was removed and cells were washed once with PBS (USB 
Corporation). Culture medium was added to each well, and cells were incubated 
at 37°C and 5% CO; for the duration of the experiment. For neurospheres, NPCs 
were kept in constant shaking. For neuronal infection, NPCs were previously dif- 
ferentiated for 28 days and then neurons were infected with the desired MOI. For 
organoids, the number of cells available for infection was estimated to be 2.5 x 104 
cells, as calculated by dividing the average surface area of a typical organoid by the 
average area of a typical cell (that is, a fibroblast). This calculus was used to estipu- 
late the desired 0.1 MOI. For mock controls, the same volume of supernatant was 
added to each experiment, and the same procedures were followed. 
Immunofluorescence. Cells were fixed using paraformaldehyde, 4% in PBS, for 
15min at room temperature. After washing, the cells were permeabilized with 0.1% 
Triton X-100 (Promega) diluted in PBS for 15min. After blocking with 2% of BSA 
(Sigma-Aldrich) for 4h, primary antibodies directed against the following were 
added: anti-ZIKV (polyclonal mouse, Institute Pasteur in Dakar, 1:80), anti-Flavivirus 
D1-4G2-4-15 (polyclonal mouse, Millipore, 1:100), 1:50, anti-MAP2 (chicken, 
Abcam ab5392, 1:200), anti-cleaved-caspase-3 (rabbit, Cell Signaling #9661, 1:400), 
anti-Sox2 (mouse, Abcam ab97959), anti-GFAP (rabbit, Abcam, 1:500) and anti-Mu- 
shashil (rabbit, Abcam, 1:1000) (Extended Data Table 2). The cells were incubated 
overnight at 4°C. Secondary antibodies were added for a one-hour incubation at 
room temperature: anti-mouse Alexa Fluor 488, anti-chicken Alexa Fluor 647, anti-rat 
Alexa Fluor 555 and anti-rabbit Alexa Fluor 555 (Invitrogen). The nuclei were 
stained with DAPI (Invitrogen, 1:10,000) diluted in a PBS 1 solution for 5 min 
and mounted with DPX (Sigma). Images were acquired with Nikon Eclipse 80i. 
Analysis of data was performed using software NIS Elements 3.22 (Tokyo, Japan). 
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Cerebral organoids analyses. Human and chimp organoids were infected with an 
MOI of 0.1 and analysed after 24 and 96h post-infection. Organoids were cryo- 
sectioned at 201m. Immunofluorescence was performed after blocking sections in a 
solution with 0.1% Triton and 3% BSA (Gemini) for 1h at room temperature. The 
primary antibodies were diluted in a solution with 0.1% Triton and 3% BSA, and 
the sections were incubated with following antibodies: anti-ZIKYV, anti-Flavivirus, 
anti-MAP2, anti-cleaved-caspase-3 and anti-Sox2, all mentioned above, and anti- 
PAX6 (rabbit, Covance PRB-278P, 1:100), anti-TBR1 (rabbit, Abcam ab31940, 
1:300), CTIP2 (rat, Abcam ab18465, 1:100) and Ki67 (rabbit, Abcam ab15580, 
1:100). The sections were blocked with 0.1% Triton (Sigma-Aldrich) and 3% BSA 
for 30 min at room temperature and the secondary antibodies previously diluted, 
the same mentioned above, were added. The nuclei were stained with DAPI, as 
mentioned above and slides were mounted with DPX (Sigma-Aldrich). 
Transmission electron microscopy. Cell pellet was fixed using a 3% glutaralde- 
hyde solution (Merck) at 4°C for 2h, rinsed in three changes of PBS for 1h, and 
incubated for 16h at 4°C. The next day, post-fixation was performed with 1% of 
osmium tetroxide for 30 min at room temperature. Dehydration was carried out 
gradually with a series of ethanol concentrations: 70%, 95% and 100%. Sample was 
taken through two changes of propylene oxide and placed at a 1:1 ratio with embed- 
ding medium for 1h ina rotary mixer followed by 100% embedding medium at 
room temperature for 24h. Fresh embedding medium was placed overnight at 
37°C and polymerized in oven to 24h. Ultrathin sections were cut and stained 
with uranyl acetate and lead citrate. The cells were visualized with a transmission 
electron microscope (JEOL, JEM 1011, Peabody, Massachusetts, USA). All exper- 
imental analyses were performed blinded to the treatment. 

Flow cytometry. Cells were infected with an MOI of 10 or 1, prepared using 
supernatants from infected C6/36, and equal volume of mock. Cellular infection 
occurred at 4°C for 1h with cell homogenization every 10 min. After that, the cells 
were washed once and then maintained at 37 °C in CO; incubators with medium, 
as described before. After 24, 48, 72 and 96h post-infection the cells were harvested 
and then submitted to a staining protocol for annexin V and propidium iodide 
(PI) (BD Biosciences). The cells were washed twice with PBS and were harvested 
with 200 1l of trypsin 0.25% (LGC) for 10 min at 37°C. Next, the cell suspensions 
were washed in DMEM with 10% of FBS and centrifuged for 5 min at 450g and 
4°C. The cells were then resuspended in 20,11 of annexin V binding buffer in 
96-well round-bottom plates and with 1 jul of FITC-annexin V + 111 of Pl and then 
incubated at room temperature for 15 min protected from light. After incubation 
period, the samples were added to 80,11 of binding buffer and acquired in the BD 
FACS Accuri C6 (BD Biosciences) flow cytometer. 

Computed tomography. Mice were properly anaesthetized with isoflurane and 
immobilized on their right side on the bed with a piece of gauze and positioned 
with the whole body in the field of view (FOV). CT images were acquired using 
small animal imaging equipment (Triumph Trimodality Gamma Medica Ideas) 
with 45 kVp, 0.4mA and 2.13 min of X-ray exposition (512 projections over 360°). 
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The images were reconstructed using the FBP (filtered back projection) algorithm, 
a matrix of 512 x 512, smoothing filter and a pixel size of 92-117 1m (according to 
the animal size). Experienced evaluators, who were blinded to the animal group 
assignments, performed images analyses using the AMIDE 1.0.4, General Public 
License software. Fiducial marks were added to measure the distance between 
points considering, visually, the larger axis of the brain in the sagittal plane for pos- 
terior-anterior and superior-inferior distances and the coronal plane for the lateral 
right-left. For measuring whole-body length, the distance between the superior 
point of the brain to the first vertebra of the tail was used. The thorax measure was 
made from the third rib (right-left) in the coronal plane to the spinal cord level. 
The results are presented in mm. 

Histologic processing. Tissue histology was performed using a dehydrating pro- 
tocol with two alcohol baths of 95% (the first for 1h and 15min and the second 
for 30 min), three absolute alcohol baths (1h for the-first followed by 3h for the 
second and 2h for the third), followed by clarification with three baths of xylene 
(the first for 30 min and the next two for 1h each). Finally, the material was added 
to three paraffin baths (the first for 30 min and the second two for 1h each). Then 
the material was immersed in paraffin and cut with a microtome to a thickness of 
5mm. The deparaffinization protocol consisted of three xylol baths heated in an 
oven for 30 min each, two baths of absolute alcohol for 2 min each, two baths of 95% 
alcohol lasting 2 min each, an alcohol in water bath (85%) for 2 min and the last 
bath in 70% alcohol for 2 min. The haematoxylin and eosin staining protocol began 
with two quick baths in running water, followed by a 2-min bath in distilled water, 
a 2-min bath in haematoxylin, a 5-min bath in running water, a 1-min bath in 
eosin, followed by 1-min in a fast-flowing water bath, two baths of 95% ethanol for 
2min each, two baths of absolute ethanol for 2 min each, ending with three baths in 
xylene for 2min each. Slides were mounted using Permount (Sigma-Aldrich) and 
analysed.on multiple coronal slices in glass slides using light microscopy (Olympus 
BX40, ZEISS KS400) on a genotype-blinded fashion. 
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Extended Data Figure 1 | Impact of ZIKV®® infection in the C57BL/6 
and SJL mice. a, Scheme for infecting mice and the follow-up analyses. 
Pregnant females at approximately day 10-13 of gestation were challenged 
with 4 x 10!° PFU of ZIKV®® via an intravenous route. Their pups 

were analysed immediately after birth for signs of malformation. 

b, A representative pup from mock-infected and the ZIKV®®-infected 
C57BL/6 mice. Scale bar, 1 cm. c, C57BL/6 pups born with no gross 
morphological changes or size differences compared to mock controls 
(n=21 pups from three separate litters, error bars, s.e.m, t-test). 
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d, e, CT analysis confirmed lack of anatomical alterations (n = 21 pups 
from three separate litters, error bars, s.e.m, t-test). For scale, the crosses 
indicate 1.2 mm (top left; top right; bottom left) and 0.6 mm (bottom 
right). f, ZIKV®® RNA was not detected in the brains of six C57BL/6 
pups. g, Cell death pathway signature revealed by qPCR gene expression 
in the brains of the ZIKV®*-infected SJL pups (1 =2 technical replicates of 
pooled RNA from two pups each group; threshold = twofold). h, Heat map 
representation of misregulated genes in g. 
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Extended Data Figure 2 | Histopathological analysis of brains from and ‘empty’ nuclei aspect with chromatin margination observed in 
ZIKV®®- infected SJL pups. Morphological aspect of hippocampus, thalamus and hypothalamus. Scale bar from left to right: 100 1m, 100 1m, 
thalamus, hypothalamus and cerebellum from brains of pups born from 50m and 10pm. 


mothers infected with ZIKV®®. Arrowheads indicate intranuclear vacuoles 
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Extended Data Figure 3 | Impact of ZIKV infection in human NPCs two different donors; error bars, s.e.m.). e, Viral replication dynamics in 
and neurons. a, Scheme of the in vitro experiments using hPSCs. The human neurons over time (MOI = 1) (n= 2 technical replicates from two 
cells were differentiated into NPCs, neurons, neurospheres and cerebral different donors; error bars, s.e.m.). f, Dynamics of NPCs toxicity over 
organoids to test the impact of ZIKV®® over time. b, Infection of NPCs time after ZIKV infection (MOI = 1), indicating no significant differences 
with the ZIKV®® and ZIKV“F (MOI = 1) at 24 and 96h post-infection. among the different viruses (n = 2 technical replicates from two different 
Scale bar, 251m. c, Aspects of iPSC-derived human neurons after ZIKV donors; error bars, s.e.m.). g, h, Viral replication dynamics of ZIKV in 
infection (MOI = 1) at 24 and 96h post-infection. Scale bars, 200 1m human neurons over time at MOI= 10 and MOI = 1, respectively (n =2 
(bright field); 25 1m (immunofluorescence). d, Viral replication dynamics technical replicates from two different donors; error bars, s.e.m.; one-way 
in human NPCs over time (MOI = 1) (n=2 technical replicates from ANOVA). 


© 2016 Macmillan Publishers Limited. All rights reserved 


Magia RESEARCH 


a Neurospheres b ee 
zikv"® zIKv** Mock ee 
@® ZIKV 
(AF 
— 500 ed x 
€ KEK 
= 
> 400 
N 
= = 2 300 
— fo} g teres 
Oo =2 
S 200 
a 100 
z 
0 
24 96 24 96 24 96 
Time (hours) 
re d Neurospheres 
-s zikv"® zikv*" Mock 
ce) = e ail 3 
= 2 onl 
23 Fs 
o- oa 
Pare) 
3° © ZIKVBR - 
28 ™ > 
= — ZIKVAF Q 
z3 
4 
24 48 72 96 
Time (hours) 
SC Neurospheres f Neurospheres 
ir ir 
o oo 
=> >s 
x< x<| 
NG@ N 
<= 
= © 
_ a 
=f of 
ON ON 
= = 
a4 
° 
° » 
= ? 
4 iz 3 
Extended Data Figure 4 | Impact of ZIKV infection in human different donors). d, Representative bright-field images of ZIKV infection 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | Human and chimp cerebral organoids 
infected with ZIKV. a, Representative image of an entire cross-section 
of a cerebral human organoid infected with the ZIKV®8 (MOI=0.1, 

24h post-infection). Scale bar, 200 1m. b, Detail of the surface of a human 
organoid infected with the ZIKV®® at 24h post-infection (MOI =0.1). 
Marginal zone (MZ) and cortical plate (CP) delineated by dotted white 
lines. Scale bar, 200 um. c, Detail of the surface of a human organoid 
infected with the ZIKV®® at 96h post-infection (MOI = 0.1). Notice 

the significant tissue damage and reduction in the CP at 24h post- 
infection. Scale bar, 200 1m. d, A representative characterization of CP 
and ventricular zone (VZ) in human organoid infected with the ZIKV"® 
at 24 and 96h post-infection (MOI = 0.1). Scale bar, 501m. e, Reduction 
in the cortical thickness measured by the extension of TBR1-positive 
layer of cells in human organoids at 96 h post-infection (MOI =0.1; n=3 
replicates from three human cell lines; error bars, s.e.m.; unpaired t-test, 
*P =0.0203). f, Reduction in the cortical thickness measured by the 
extension of CTIP2-positive cells layer in human organoids at 96h 
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post-infection (MOI = 0.1; 1 =3 replicates from three human cell 

lines; error bars, s.e.m.; unpaired f-test, ***P = 0.0005). g, Nuclear 

size (diameter) of cleaved-caspase-3-positive apoptotic cells in human 
organoids at 96h post-infection (MOI= 0.1; n= 10 organoids/slides from 
three human cell lines; error bars, s.e.m.; unpaired t-test, ***P = 0.0004). 
Scale bar, 501m. h, Percentage of TUNEL-positive cells in relation to 
controls (dotted line) at 24 and 96h post-infection (MOI = 0.1; n= 10 
organoids/slides from three human cell lines; error bars, s.e.m.; ANOVA, 
**P— 0.0042). i, Percentage of TBR1-positive cells in non-primate 
organoids (chimp) in relation to controls (dotted line) at 24.and 96h 
post-infection (MOI = 0.1; 1 =3 organoids from two donors, error bars, 
s.e.m.; ANOVA). j, Percentage of CTIP2-positive cells in non-primate 
organoids (chimp) in relation to controls (dotted line) at. 24 and 96h 
post-infection (MOI = 0.1; n=3 organoids from two donors; error bars, 
s.e.m.; ANOVA). k, Viral replication dynamics in chimpanzee organoids 
over time (MOI=0.1; n=3 replicates from two donors; error bars, s.e.m.; 
ANOVA). 
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Extended Data Table 1 | Probes used for TAM receptors detection 


Target Species Code 

Tyro-3 Human Hs_00170723 m1 
Axl Human Hs_ 01064444 m1 
MertK Human Hs_ 01031973 m1 
DC-Sign Human Hs_ 01588349 m1 
B-actin Human Hs_01060665-m1 
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Extended Data Table 2 | Antibodies used in this study, related to experimental procedures 
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Cell 


type/tissue Cat 

on Antigen Host Supplier number Dilution 
ee Institute 

Zika virus ZIKV aivelonal Pasteurin |MAB10216 | 1:50 
Rom Dakar 

Flavivirus Flavivirus Sel Millipore ab5392 1:200 

Neurons | MAP2 aa Abcam | ab5392_~—=*| 4:20 

‘ Cleaved Rabbit Cell f 

Apoptosis caspase-3 | polyclonal | Signaling aoe Koo 

Progenitor | sox2 pOuSe Abcam ab79351 | 1:200 
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polyclonal 

Neuronal . 
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Overcoming mTOR resistance mutations with a 
new-generation mTOR inhibitor 


Vanessa S. Rodrik-Outmezguine!*, Masanori Okaniwa?*, Zhan Yao, Chris J. Novotny’, Claire McWhirter?, Arpitha Banaji!, 
Helen Won4, Wai Wong”, Mike Berger‘, Elisa de Stanchina®, Derek G. Barratt*, Sabina Cosulich*, Teresa Klinowska’, Neal Rosen® 


& Kevan M. Shokat?:” 


Precision medicines exert selective pressure on tumour cells 
that leads to the preferential growth of resistant subpopulations, 
necessitating the development of next-generation therapies to treat 
the evolving cancer. The PIK3CA-AKT-mTOR pathway is one of 
the most commonly activated pathways in human cancers’, which 
has led to the development of small-molecule inhibitors that target 
various nodes in the pathway. Among these agents, first-generation 
mTOR inhibitors (rapalogs) have caused responses in ‘N-of-1’ 
cases, and second-generation mTOR kinase inhibitors (TORKi) are 
currently in clinical trials?-*. Here we sought to delineate the likely 
resistance mechanisms to existing mTOR inhibitors in human cell 
lines, as a guide for next-generation therapies. The mechanism of 
resistance to the TORKi was unusual in that intrinsic kinase activity 
of mTOR was increased, rather than a direct active-site mutation 
interfering with drug binding. Indeed, identical drug-resistant 
mutations have been also identified in drug-naive patients, 
suggesting that tumours with activating MTOR mutations will 
be intrinsically resistant to second-generation mTOR inhibitors. 
We report the development of a new class of mTOR inhibitors 
that overcomes resistance to existing first- and second-generation 
inhibitors. The third-generation mTOR inhibitor exploits the 
unique juxtaposition of two drug-binding pockets to create a 
bivalent interaction that allows inhibition of these resistant 
mutants. 

The MCEF-7 breast cancer cell line was exposed to high concentra- 
tions of either a first-generation mTORC1 inhibitor, rapamycin or 
a second-generation mTOR ATP competitive inhibitor, AZD8055 
(a TORKi), for 3 months, until resistant colonies emerged. Deep 
sequencing revealed that the AZD8055-resistant (TKi-R) clones 
harboured an MTOR mutation located in the kinase domain at the 
M2327] position (Fig. la and Extended Data Fig. la), while two 
rapamycin-resistant (RR) clones contained mutations located in the 
FKBP12-rapamycin-binding domain (FRB domain) at positions 
A2034V (RRI cells) and F2108L (RR2 cells). The clinical relevance of 
these mutations is supported by a case report of a patient who acquired 
the identical F2108L MTOR mutation after relapse while under treat- 
ment with everolimus” (Extended Data Table 1). 

To verify that the mutations altered the efficacy of their respec- 
tive drugs and were not simply passenger mutations, we analysed the 
phosphorylation of effectors downstream of mTOR in several cellular 
systems. In the RR cells, phosphorylation of the normally rapamycin- 
sensitive sites on S6K (T389) and S6 (S240/244 and $235/236) were 
unaffected even at high rapalog concentrations (100 nM) (Fig. 1b and 
Extended Data Fig. 1b). Phosphorylation of the key mTOR effector 
4EBP1 is normally unaffected by rapamycin but strongly reduced by 


TORKi**®. In the TKi-R cells, however, 4EBP1 phosphorylation was 
significantly less sensitive to a variety of TORKi (Fig. 1c and Extended 
Data Fig. 1c, d). Consistent with this weakened signalling inhibition, 
the RR and TKi-R clones were significantly less sensitive to their 
respective drugs in a 72h proliferation assay when compared to the 
parental line (Fig. 1d, e and Supplementary Table 1). 'To determine 
if the RR and TKi-R MTOR mutations were directly responsible for 
the drug-resistance phenotype, each mutant was expressed in another 
model, MDA-MB-468 cells, which confirmed that the MTOR muta- 
tions are sufficient to promote dominant resistance (Extended Data 
Fig. 2a-d). 

FRB domain mutations have been found in untreated patients 
(Extended Data Table 2), and previous random mutagenesis screens 
in yeast have shown that single amino acid changes in the mTOR FRB 
domain confer rapamycin resistance”"'*. The RR mutants identified 
in this screen exhibit a similar mechanism of resistance by disrupting 
the interaction of mTOR with the FKBP12-rapamycin complex in cells 
and in vitro (Fig. 2a, b). 

In contrast to the FRB-domain mutations found in RR cells, which 
line the rapalog/FKBP-binding pocket, analysis of the recently solved 
structure of the mTOR kinase domain in complex with the TORKi, 
PP242 (Protein Data Bank (PDB), 4JT5)!°, revealed that M2327 is 
>15 A away from the inhibitor, suggesting either an allosteric mecha- 
nism of reduced TORKi affinity or that this mutation causes resistance 
through a mechanism that does not involve reduced drug binding. 
Indeed, both wild-type and M2327I mTOR bind AZD8055 with sim- 
ilar affinities (Fig. 2c). We asked whether the M2327I mutation in 
the mTOR kinase domain altered the kinetic properties of the kinase. 
As shown in Fig. 2d, the M2327I mutant has a threefold increase in 
mTOR kinase activity compared with the wild-type and RR mutants. 
This is consistent with the higher phosphorylated (p)-S6K (T389), 
p-AKT (S473) and p-4EBP1 S65 basal levels observed in these cells 
(Extended Data Fig. 1d). 

The emergence of a hyperactive MTOR kinase domain mutation 
(M2327]) that could theoretically confer a growth advantage led us 
to wonder if similar mutations might pre-exist in drug-naive patient 
tumours. Indeed, the precise M2327] mutation as well as other MTOR 
kinase domain mutations have been identified in five untreated 
patients (Extended Data Tables 1 and 3)!*. To determine if additional 
MTOR kinase domain mutants were also hyperactive and insensitive 
to TORKi, various MTOR kinase domain mutations that occur in 
patients were inducibly expressed in MDA-MB-468 cells and tested 
for sensitivity to the TORKi AZD8055 and MLN0128 (Extended Data 
Fig. 2d, e). The concentrations of drug required to inhibit mTORC1 
and mTORC2 substrates in these cells were 3- to 30-fold higher than 


1Program in Molecular Pharmacology, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA. Howard Hughes Medical Institute and Department of Cellular and Molecular 
Pharmacology, University of California San Francisco, San Francisco, California 94158, USA. *AstraZeneca, Alderley Park, Macclesfield, Cheshire SK10 4TG, UK. Human Oncology and 
Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA. ®Anti-Tumor Assessment Core, Memorial Sloan Kettering Cancer Center, New York, New York 
10065, USA. "Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA. 7Department of Chemistry, University of California Berkeley, Berkeley, California 


94720, USA. 
*These authors contributed equally to this work. 


00 MONTH 2016 | VOL 000 | NATURE | 1 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


a Rapamycin AZD8055 
0 
Resistance : ee 
screen ie 
N7SN7 SN 
Aue 
Kinase 
mTOR | FAT i domain Crate) 
HEAT Y 
repeats A2034V. F208 M2327! 
b MCF-7 RR1 RR2 c MCF-7 TKi-R 
© S QHD SHH 
Rapamycin (AM) © OPA > PAP g GPA | PHP 9 Pao OY  azpso55 (rm) © OLP ALLS 9 o OHSLHS 
p-AKT S473 | — — erm me cee || a a ee ey ce || ee ee me p-AKT S473 j==-= 0 oe oe ey 
p-p70S6K T389 | =sSaSeeea || SSS=S228) 1-p70S6K 1399 js -— Bse55—_ 
p-S6 S240/244 |e == = = Seeeeen | waeememes! 6.56 5210211 |\eaewe Saeeae — 
p-S6 $235/23 SES eeen==-— 22S Se == p-S6 $235/236 |e oe (oe Sf em 
p-4EBP1 137/46 (pga a ee oe | oe ee ae | eee | pte 8P 1 137/46 [sees [Do hedeekedeiel 
P-4EBP1 S65 | aaemerr ee ee ee ce | ae ae ee eee ee ee || ae ee ee ae ee p-4EBP1 S65 |evenen en woe cae 
Pp-4EBP1 170 |g Gab OP Ge Oe free |e ce Oe OR Oe ee ee) ce oe ev ee oe p-4EBP1 170 |= ae ee |@ om can oom GD GED cD +s 
ACtiN | ae 6 come come cee Gap cay || RP nD Se Oe cee ey || ee ee ee ee oP ee Actin |€@ ee 6 ee cp cy cay] |G GD GD Om OD Oe ae am] 
d 150 @MCF-7 e 150 
m= MCF-7 A2034V (RR1) @ MCF-7 
“ & MCF-7 F2108L (RR2)  MCF-7 M23271 (TKi-R) 
& g 
s iS 
2 100 > 100 
3 rol 
> s 
= = 
oO oO 
2 ‘ 
= 50 2 0 
s & 
fea Ka 
ty) 


A i) 1 
log [rapamycin] (nM) 

Figure 1 | Single amino acid mutation accounts for acquired resistance 

to mTOR inhibitors. a, Graphic representation of mTOR domains and site 

mutagenesis isolated in rapamycin- and AZD8055-resistant cells. b, c, The 

effects of rapamycin (b) or AZD8055 (c) on mTOR signalling was assessed 

in MCF-7, RR1 and RR2 cells (or in TKi-R cells (c)) by immunoblotting 

4h after treatment. For gel source data, see Supplementary Fig. 1. 


those required in wild-type cells, although not all substrates show pre- 
cisely the same dose response. 

These data suggest that the hyperactivation of mTOR kinase by 
single amino acid mutations found in drug-naive patients can reduce 
the sensitivity to ATP-competitive mTOR inhibitors in cells. These 
findings highlight the need for a new class of mTOR inhibitor capa- 
ble of targeting both drug-naive (pre-existing) MTOR mutant-driven 
cancers, as well as emergent resistant mutations. 

We developed a molecular model of mTOR in complex with 
rapamycin-FKBP12 using the FRB domain as the common domain 
in two available mTOR crystal structures (PDB, 1FAP and 4JT5) 
(Fig. 3a). This model revealed the juxtaposition of the rapamycin- 
and TORKi-binding sites and suggested an avidity-based approach 
to overcome drug-resistant mutations in either the FRB or the kinase 
domain. A bivalent mTOR inhibitor consisting of a rapamycin-FRB- 
binding element appropriately linked to a TORKi would be expected 
to inhibit the RR class of FRB-domain mutants because the TORKi- 
binding site would provide high-affinity recognition. For the TKi-R 
class of kinase domain mutations, a bivalent inhibitor would be pre- 
dicted to be similarly potent by virtue of an intact rapamycin-binding 
site. We reasoned that binding at one site would position the second 
half of the ligand in close proximity for binding to the second site, thus 
overcoming point mutations that diminish drug binding (as found in 
RR cells) or that hyperactivate the kinase (as found in TKi-R cells)!®. 
To develop a new bivalent class of mTOR inhibitors, we required a 
non-perturbing, strain-free linker between rapamycin and a TORKi, 
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log [AZD8055] (nM) 


d, e, Dose-dependent cell growth inhibition curves of MCF-7 and 
rapamycin-resistant MCF-7 A2034V (RR1) and MCF-7 F2108L (RR2) 
cells treated with rapamycin at day 3 (d) or MCF-7 and AZD8055-resistant 
MCE-7 M2327] (TKi-R) cells treated with AZD8055 (e). Each dot and 
error bar on the curves represents mean + standard deviation (s.d.) 

(n= 8). All experiments were repeated at least three times. 


such that the resulting inhibitor can simultaneously bind to both sites. 
Analysis of our mTOR-rapamycin-FKBP12 model revealed that the 
hydroxyl group at the C40 position of rapamycin is exposed to sol- 
vent and is oriented towards the ATP-binding site of mTOR (Fig. 3a). 
Analysis of the TORKi (PP242)-bound structure (PDB, 4JT5) revealed 
that the N-1 position of the pyrazole ring is oriented towards rapa- 
mycin and exposed to solvent (Extended Data Fig. 3a). We selected 
MLNO0128 as the TORKias it is a highly selective’” structural analogue 
of PP242, and is currently in clinical trials. 

To determine the optimum linker length between the chosen sites, 
we used the molecular modelling program Molecular Operating 
Environment’® to evaluate the potential energy of a methylene-based 
cross-linker with lengths from 10 to 40 heavy atoms. This analysis 
revealed that 27 atoms would be the minimal length required to span 
the two ligand-binding sites (Extended Data Fig. 3b). We incorporated 
a polyethylene glycol unit of varying lengths and used the azide-alkyne 
cycloaddition reaction to synthesize RapaLink-1, -2 and -3 (Fig. 3b, 
Extended Data Fig. 3c and Supplementary Methods). Our modelling 
suggested that RapaLink-3, with an 11-heavy-atom linker would be 
too short to allow optimal binding to both sites simultaneously, while 
RapaLink-1 and -2, which contain 39- and 36-heavy-atom linkers, 
respectively, would allow simultaneous bivalent binding to the mTOR- 
FKBP12 complex. 

Cells were treated with increasing concentrations of either 
RapaLink-1, -2 or -3, and the effects on mTOR signalling were assessed 
by western blotting. We observed that both RapaLink-1 and -2 
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Figure 2 | Non-overlapping mechanisms of resistance mediated by 
mTOR mutations. a, mTOR-Flag wild-type (WT) and variants were 
transfected into 293H cells. Cells were treated with rapamycin and lysates 
were immunoprecipitated (IP) with an anti-Flag antibody. mTORC1 
complex formation was assessed by immunoblotting. b, 293H cells 

were transfected and complex isolated as described in a, and an in vitro 
competition assay was performed followed by immunoblotting. 

Shorter and longer exposure (exp.) are shown. For gel source data, see 
Supplementary Fig. 2. c, Varying concentrations of AZD8055 were tested 
in vitro on wild-type and M2327] mTOR followed by a kinase reaction 


inhibited the phosphorylation of both mTORC1 and mTORC2 targets 
at doses between 1 and 3nM (Fig. 3c). However, RapaLink-3, which 
contains the shortest linker, showed diminished potency against the 
phosphorylation of 4EBP1 (T37/46/70 and S65) and AKT (S473) while 
still inhibiting p-S6 (S240/244 and $235/236). This is consistent with 


(see Methods). The half-maximum inhibitory concentration (ICs9) values 
were determined by fitting to a standard four-parameter logistic using 
GraphPad Prism v.5. The diagram shows the mean of n = 3 data. The 

error bars represent the s.d. between experiments. d, 293H cells were 
transfected and the complex was isolated as described in a. An in vitro 
kinase assay was performed and the level of p-AKT (S473) was determined 
by immunoblotting. Symbols on each curve represent the relative p-AKT 
at different time points. The kinase activity curves were generated using 
GraphPad Prism v.6 after densitometry analysis was performed. All 
experiments were repeated at least three times. 


the prediction that a longer linker is necessary to allow simultaneous 
binding to both drug sites and indicates that rapamycin binding is 
dominant over MLN0128 binding due to the preferential inhibition 
of p-S6 over p-4EBP1. Consistent with its strong signalling inhibition 
(Fig. 3d), RapaLink-1 potently inhibited the growth of MCF-7 cells at 
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Figure 3 | RapaLink-1 is a potent mTOR inhibitor. a, Molecular model or with rapamycin, MLNO128, or a combination of rapamycin and 
y' y' 


constructed by two available co-crystal structures, mTOR catalytic- 
domain-bearing TORKi PP242 (PDB, 4JT5) and mTOR FRB-domain- 
rapamycin-FKBP 12 (PDB, 1FAP). Dotted line represents a guide line 

for the linker design of bivalent mTOR inhibitors. b, RapaLink-1 structure 
is displayed. c, d, MCF-7 cells were treated with RapaLink-1, -2 and -3 (c) 


MLNO128 or RapaLink-1 (d) for 4h followed by immunoblotting. The 
rapamycin panel is the same as that shown in Fig. 1b and the RapaLink-1 
panel is the same as that shown inc. All cellular experiments were repeated 
three times. For gel source data, see Supplementary Fig. 3. 
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Figure 4 | RapaLink-1 reverses resistance due to mTOR FRB and kinase 
domain mutations. a, b, e, MDA-MB-468 cells inducibly expressing 
mTOR F2108L (a) or M2327I (b) or F2108L/M2327I mTOR double 
mutant (e) were treated as in Fig. 3d, followed by immunoblotting. 

For gel source data, see Supplementary Figs 4, 5 and 6. All experiments 
were repeated at least three times. c, d, Mice bearing RR1 (c) or TKi-R (d) 


levels comparable to rapamycin or a combination of rapamycin with 
MLN0128 (Extended Data Figs 4a and 5). 

We further tested the requirement of both halves of RapaLink-1 
to simultaneously bind mTOR. First, we measured the ability of 
RapaLink-1 to recruit FKBP12 to mTOR by performing an in vitro 
FKBP12-binding assay; we show that RapaLink-1 is indeed able to 
recruit glutathione S-transferase (GST)-FKBP 12 to wild-type mTOR 
(Extended Data Fig. 4b, lane 12). Moreover, we used the FKBP12 
competitive ligand, FK506, to pharmacologically block RapaLink-1 
from interacting with FKBP12 and thus mTOR. We observed that 
FK506 completely rescued the phosphorylation of mTORCI1 and C2 
substrates upon RapaLink-1 treatment (Extended Data Fig. 4c). Last, 
we isolated MCF-7 RapaLink-1-resistant cells; these cells harbour a 
mutation located in the mTOR FRB domain at position F2039S. As 
shown in Extended Data Fig. 6a, rapamycin treatment did not inhibit 
p-S6K (T389) and p-S6 (S240/244 and $235/236) in the mTOR F2039S 
cells as observed in the MCF-7 cells (Fig. 1b). Moreover, these cells dis- 
played a decreased sensitivity to MLN0128 and a combination of rapa- 
mycin and MLN0128, as well as RapaLink-1 as compared to parental 
MCTF-7 cells (Fig. 3d). Taken together, these data demonstrate that 
the binding of RapaLink-1-FKBP12 to the FRB domain is necessary 
for simultaneous binding to the ATP site of mTOR and therefore for 
RapaLink-1-dependent inhibition of mTOR signalling. 

While the design of bivalent inhibitors for therapeutic use has had 
mixed success owing to the poor pharmaceutical properties of the 
hybrid molecules’’, FKBP12-binding hybrids have actually been used 
to improve the pharmaceutical properties of small-molecule inhibi- 
tors unrelated to TORKi. These FK506-based hybrids exploit the high 
intracellular concentration of FKBP12, specifically in blood cells, and 
the high affinity of FK506 for FKBP12, to create a reservoir of drug that 
prolongs serum half-life’®. In agreement with the improved pharma- 
ceutical properties of previous FKBP12-binding hybrids, RapaLink-1 
showed prolonged inhibition of mTOR signalling in vitro (Extended 
Data Fig. 6b, c), as well as in vivo after a tolerable dose of 1.5 mg kg, 
which lasted for over 4 days (Extended Data Fig. 6d, e) and was able 
to inhibit the growth of wild-type mTOR MCF-7 xenografts as well as 
the current clinical mTOR inhibitors (Extended Data Fig. 6f). 

To assess whether RapaLink-1 could block mTOR signalling of 
the F2108L mTOR and M2327I mTOR drug-resistant mutants, 
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xenograft tumours (n =5 for each group) were randomized to four 
different groups: (1) vehicle (Monday (M), Wednesday (W), Friday (F)); 
(2) rapamycin (10 mg kg~'; M, W, F); (3) AZD8055 (75 mg kg” '; M, W, F); 
and (4) RapaLink-1 (1.5 mg kg '; weekly). Tumour size was measured 

by calliper twice per week. The results were reported as tumour volume 
(mm?) +s.d. 


MDA-MB.-468 cells expressing the alleles were treated with either 
rapamycin, MLNO0128, a combination of both drugs, or RapaLink-1. 
Consistent with the ability of the RapaLink-1-FKBP12 complex to 
bind mTOR FRB and kinase-domain mutants (Extended Data Fig. 4b, 
lanes 18 and 24), and increased avidity compared to rapamycin 
or MLNO128 (Extended Data Fig. 7a); RapaLink-1 at low doses 
(3-10 nM) was the only drug regimen capable of inhibiting mTOR 
signalling in both F2108L mTOR- and M2327] mTOR-expressing 
cells (Fig. 4a, b). Mouse xenografts of MCF-7 cells expressing the RR1 
mutant A2034V mTOR showed significantly less sensitivity to rapamy- 
cin yet maintained full sensitivity to AZD8055 and RapaLink-1 treat- 
ment (Extended Data Fig. 7b and Fig. 4c). Similarly, xenografts with 
MCF-7 cells expressing the TKi-R mutant, M2327I mTOR, showed 
significantly less sensitivity to AZD8055 treatment, yet retained full 
sensitivity to rapamycin and RapaLink-1 (Extended Data Fig. 7c and 
Fig. 4d). The dosing of RapaLink-1 may be limited by toxicity, which 
can only be tested in the clinic. However, our preclinical data and that 
of others”! indicate that mTOR kinase inhibitors can be given safely 
when administered intermittently and are more effective than daily 
dosing schedules. 

It is reasonable to anticipate that patients bearing hyperactive MTOR 
kinase domain mutations, who originally respond to rapalogs, may 
eventually relapse owing to the emergence ofa second FRB mutation, 
as previously observed’. To test whether RapaLink-1 would be an 
effective mTOR inhibitor in this case, MDA-MB-468 cells express- 
ing F2108L/M2327I mTOR mutations were generated. As expected, 
mTOR substrates were resistant to rapamycin, MLN0128 and to 
a combination of both treatments in the F2108L/M2327I double- 
mutant cells. Yet, the signalling of these double-mutant cells remained 
as sensitive as the mTOR wild-type cells to RapaLink-1 treatment 
(Fig. 4e and Extended Data Fig. 7d). 

Through exploitation of both the ATP- and the FRB-binding sites of 
mTOR, we have developed a new class of mTOR inhibitor that potently 
inhibits tumour growth and signalling in wild-type mTOR-expressing 
cells as well as in cells that have acquired resistance to rapalogs or ATP- 
competitive inhibitors, or both. Such inhibitors have been developed 
for G-protein-coupled receptors” (termed bitopic ligands) but have 
not been exploited in protein kinase inhibitor design. Interestingly, 
the only other bitopic kinase inhibitor we are aware of is the natural 
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CDK2/cyclin-A inhibitor p27. The peptidic inhibitor spans the cyclin 
box and extends into the ATP site of CDK2, creating a high-affinity, 
highly specific inhibitor’. Perhaps other allosteric sites near the 
ATP pocket on kinases could be similarly exploited, such as the PIF 
pocket”, or it might even be possible to bridge two adjacent ATP 
pockets in kinase complexes such as KSR-MEK”>. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Cell culture and reagents. All cell lines were obtained from the American Type 
Culture Collection (ATCC). MCF-7 and MDA-MB-468 (ATCC catalogue num- 
bers HTB-22 and HTB-132, respectively) breast cancer cell lines were maintained 
in a 1:1 mixture of DMEM:F12 medium supplemented with 4mM glutamine, 
100 units ml“! each of penicillin and streptomycin, and 10% heat-inactivated 
fetal bovine serum (FBS) and incubated at 37°C in 5% CO. The MDA-MB-468 
inducible expression cells were maintained in the same medium with addition of 
501g ml“! hygromycin and 0.2 1g ml“! puromycin. The HEK-293 cells (ATCC 
catalogue number CRL-1573) were maintained in DMEM medium with glutamine, 
antibiotics and 10% FBS. The cell lines tested negative for mycoplasma contamina- 
tion. AZD8055 was obtained from AstraZeneca Pharmaceuticals, rapamycin was 
purchased from EMD Bioscience. RAD001, KU006, WY354, PP242 and MLN0128 
were purchased from Tocris. Doxycycline was purchased from Sigma-Aldrich. 
Puromycin and hygromycin stock solution were purchased from Invitrogen. Drugs 
were dissolved in DMSO to yield 10 mM stock and stored at —20°C. 

Selection of drug-resistant clones. Cell lines resistant to rapamycin (RR1 and 
RR2) and AZD8055 (TKi-R) were generated by exposing the parental breast cancer 
cell line MCF-7 to a high dose of drug (500 nM of either rapamycin or AZD8055) 
for 3 months of continuous drug exposure (change of media every 3 days); the cells 
were then sent to sequencing. MCF-7-RapaLink-1-resistant cells were generated 
by exposing MCF-7 cells to RapaLink-1 (10nM) for 9 months of continuous drug 
exposure (change of media once per week); the cells were then sent to sequencing. 
Genomic DNA sequencing. We profiled genomic alterations in 279 key 
cancer-associated genes using our integrated mutation profiling of actionable 
cancer targets (IMPACT) assay, which utilizes solution-phase hybridization-based 
exon capture and massively parallel DNA sequencing. Custom oligonucleotides 
were designed to capture all protein-coding exons and select introns of 279 com- 
monly implicated oncogenes, tumour suppressor genes, and members of pathways 
deemed actionable by targeted therapies. We prepared barcoded sequence libraries 
(New England Biolabs, Kapa Biosystems) for DNA from the MCF-7 parental cell 
line and drug-resistant subclones, and we performed exon capture on barcoded 
pools by hybridization (Nimblegen SeqCap). Two-hundred and fifty nanograms 
of genomic DNA was input for library construction. Libraries were pooled at 
equimolar concentrations (100 ng per library), combined with barcoded libraries 
from a separate project, and input to a single exon capture reaction as previously 
described”®. DNA was subsequently sequenced on an Illumina HiSeq 2000 to 
generate paired-end 75-bp reads. We achieved a mean unique sequence coverage 
of 487 x per sample. 

Method for construction of docking modelling. The coordinate of the crystal 
structure of rapamycin with FKBP12 and the FRB domain was retrieved from the 
PDB (accession 1FAP). The coordinate of the crystal structure of ATP-competitive 
mTOR inhibitor (PP242) with mTORAN and mLST8 was retrieved from the PDB 
(accession 4JT5). Two co-crystal structures were aligned and amino acids and 
water molecules in the FRB domain of 1FAP were deleted. The coordinate of 
co-crystal structure unavailable ATP-competitive mTOR inhibitor (MLN0128) was 
manually constructed by modifying the coordinate of the co-crystal structure of its 
analogue (PP242). The obtained modelling containing rapamycin and MLN0128 
was energy-minimized using the MMFF94x force field in Molecular Operating 
Environment (MOE; described later)'* to provide a template structure. During 
the minimization procedure, the following conditions were adopted. The dielectric 
constant was set to 4 x r, where r is the interatomic distance. The residues, which 
are 9 A away from compound, were fixed. And atomic charges for the protein and 
the compounds were set according to the AMBER99 and the AM1-BCC method, 
respectively. A crosslinker tethering rapamycin with an ATP-competitive mTOR 
inhibitor was manually constructed and energy-minimized using the MMFF94x 
force field in MOE to provide the initial conformation. The obtained initial con- 
formation was subjected to conformation search using LowModeMD in MOE 
(iterative limitation was set as 30). Values of potential energies (kcal mol) of 
automatically created conformation(s) were averaged. 

Code availability. MOE (version 2013.0801, Chemical Computing Group, 
Montreal, Canada), Scientific Vector Language (SVL) source. 

Cell proliferation assay. The effect of the drug on cell proliferation was deter- 
mined using a CellTiter-Glo Luminescent Cell Viability Assay kit (Promega), 
which is based on quantification of the cellular ATP level. Cells were plated in 
96-well plates at a density of 2,000-5,000 cells (8 replicates per condition). The 
following day, cells were treated with a range of drug concentrations prepared by 
serial dilution. After 1-3 days of treatment, 10011 of prepared reagent was added 
to each well. The contents of the wells were mixed on a plate shaker for 1h, and 
then luminescence was measured by an Analyst AD (Molecular Devices). The rel- 
ative growth was normalized to the untreated samples in each group. The growth 
or inhibition curves and ICs9 values were calculated with Graph Pad Prism v.6. 


Immunoblot analysis. Cells were washed with PBS once, disrupted on ice for 
30min in NP-40 (50mM Tris (pH 7.4), 1% NP-40, 150 mmol 17! NaCl, 40 mmol 17! 
NaF) or RIPA lysis buffer (Thermo Scientific) supplemented with protease and 
phosphatase inhibitors (Pierce Chemical) and cleared by centrifugation. Protein 
concentration was determined with BCA reagent from Pierce. Equal amounts of 
protein (10 to 501g) in cell lysates were separated by SDS-PAGE, transferred to 
nitrocellulose membranes (GE healthcare), immunoblotted with specific primary 
and secondary antibodies and detected by chemiluminescence with the ECL detec- 
tion reagents from Amersham Biosciences. Antibodies for p-AKT (S473) (#4060L), 
p-p70S6K (T389) (#9234L), p-S6 (S240/244) (#5364L) and p-S6 ($235/236) 
(#4858L), p-4EBP1 (T37/46) (#9459L), p-4EBP1 (S65) (#9451L), p-4EBP1 (T70) 
(#9455L), 3-actin (#4970S), mTOR (#2972S) and raptor (#2280S) were purchased 
from Cell Signaling Technology. The Flag (#F1804) antibody was purchased from 
Sigma. The GST antibody (#sc-138) was from Santa Cruz. 

Retrovirus-based gene-inducible expression cell system. The mTOR genes were 
sub-cloned into TTIGFP-MLUEX vector harbouring tet-regulated promoter. 
Mutations were introduced by using the site-directed Mutagenesis Kit (Stratagene) 
as previously described”’. The retroviruses encoding the rtTA3 or MTOR genes 
were packaged in Phoenix-AMPHO cells. The medium containing virus was fil- 
tered with 0.45 PVDF filters followed by incubation with the target cells for 6h. The 
cells were then cultured in virus-free medium for 2 days. The cells were selected 
with puromycin (2\1g ml~!) or hygromycin (500g ml) for 3 days. The posi- 
tive infected cell populations were further sorted with transiently expressed GFP 
marker after being exposed to 1 pg ml~! doxycycline and the sorted positive cells 
were cultured and expanded in medium without doxycycline but with antibiotics 
at a maintaining dose until the following assays. 

Transient transfections. Cells were seeded at 60-mm or 100-mm plates and trans- 
fected the following day using Lipofectamine 2000 (Invitrogen) according to the 
manufacturer's instructions. The ratio between DNA and lipofectamine was 1 1g 
DNA to 311 lipofectamine. 

In vitro FKBP12 binding assay. Cells expressing Flag-tagged wild-type or mutant 
mTOR were collected and lysed with 0.3% CHAPS buffer. The mTOR complexes 
were pulled down with anti-Flag-antibody-conjugated agarose. Then, the bead- 
bound complexes were incubated with recombinant FKBP12 (Fisher Scientific) 
(250nM) or FKBP12 (250nM) and rapamycin (250nM) at 4°C for 30 min. After 
incubation, the beads were washed five times with CHAPS buffer. The protein 
complexes were eluted with 1 x Laemmli Buffer and assayed by western blotting. 
Sequencing Sanger. The complementary DNA was generated by messenger RNA 
isolated from cell pellets with SV total RNA isolation kit, SV minipreps DNA 
purification kit and ImProm-II Reverse Transcription System kit from Promega. 
The mTOR cDNA was amplified with the oligonucleotides listed in Supplementary 
Methods. The PCR products were subjected to gel purification and sequenced by 
Genewiz. 

Mutagenesis. All the mTOR mutants were generated by QuikChange II Site- 
Directed Mutagenesis Kit obtained from Agilent and confirmed by Sanger 
sequencing. 

mTOR in vitro kinase assay. Active mTOR kinases were expressed in 293 cells 
and isolated by immunoprecipitation with anti-Flag beads in 0.3% CHAPS buffer. 
The AKT recombinant protein was acquired from AstraZeneca Pharmaceuticals. 
The in vitro kinase assays was performed with 250 1M ATP at 37°C for 20 min. 
In vitro kinase inhibition assay. Concentration-response curves with a concen- 
tration range of 1,000 to 0.97 nM and twofold serial dilution were constructed 
by dispensing a 100}1M DMSO solubilized stock of AZD8055 into white 384- 
well medium-binding microplates (Greiner Bio-One) using an HP D3000 Digital 
Dispenser. The kinase reaction was performed as described earlier using 200 1M 
ATP, 1.54.M peptide substrate and either 5nM wild-type mTOR or 2nM M2327I 
mutant mTOR. The ICs» values were calculated from initial rate data before being 
corrected for competition with ATP using the Cheng—Prussoff equation and 
assuming the compound is fully ATP competitive*. The ICs» values were deter- 
mined by fitting to a standard four-parameter logistic using GraphPad Prism v.5. 
Animal studies. All in vivo studies were conducted in accordance with guidelines 
approved by the Memorial Sloan-Kettering Cancer Center (MSKCC) Institutional 
Animal Care and Use Committee (IACUC). The maximal tumour volume permit- 
ted by the MSKCC IACUC is 2,000 mm’; this limit was not exceeded in any of the 
experiments. Eight-week-old athymic nu/nu female mice (Harlan Laboratories) 
were injected subcutaneously with 10 million cells together with matrigel (BD 
Biosciences). 17(-Oestradiol pellets (0.72 mg/90 days release) (Innovative Research 
of America) were implanted subcutaneously 3 days before tumour cell inoculation. 
Once tumours reached an average volume of 100 mm, mice were randomized 
(n=5 mice per group) to receive rapamycin (10 mg ke"), AZD8055 (75 mg kg"), 
RapaLink-1 (1.5 mg kg”) or vehicle only as control. Sample size was chosen based 
on previous experiments. Rapamycin was formulated in DMSO and delivered 
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intraperitoneally, AZD8055 was formulated in 30% captisol, and administered 
orally, RapaLink-1 was formulated in 6% DMSO and 30% captisol and delivered 
intraperitoneally. Mice treated with RapaLink-1 were also given subcutaneous 
saline injections twice a day, along with water supplemented with 5% glucose. 
Tumours were measured twice weekly using callipers, and tumour volume was 
calculated using the formula: length x width? x 0.52. Samples were lysed and pro- 
cessed as previously described”. 

Statistical analysis. Results are mean values + s.d. Investigators were not blinded 
when assessing the outcome of the in vivo experiments. All cellular experiments 
were repeated at least three times. 
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Extended Data Figure 1 | Acquired-mTOR mutations promote 
resistance to mTOR inhibitors in MCF-7 cells. a, The RNA from MCF-7 
parental, RR1, RR2 and TKi-R cells was isolated and the polymerase chain 
reaction with reverse transcription (RT-PCR) products were submitted 

to Sanger sequencing at Genewiz. b, MCF-7 parental, RR1, RR2 and 
TKi-R cells were treated with either dimethylsulfoxide (DMSO) or 50nM 


of RADO01 for 4h. Immunoblot analyses were performed on mTOR 
effectors. c, d, MCF-7 parental, RR1, RR2 and TKi-R cells were treated 
with either DMSO as a control or 500 nM of either KU006, WY354 or 
PP242 mTOR inhibitors (c), or with different doses of MLN0128 (d) for 
4h. Immunoblot analyses were performed on mTOR effectors. All cellular 
experiments were repeated at least three times. 
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Extended Data Figure 2 | Acquired-mTOR mutations promote 
resistance to mTOR inhibitors in MDA-MB-468 cells a, b, Dose- 


dependent cell growth inhibition of the MDA-MB-468 cells expressing 
green fluorescent protein (GFP), wild-type mTOR or different mTOR 
variants (A2034V, F2108L and M2327) upon rapamycin (a) or AZD8055 
treatment (b). Cells were pre-treated for 24h with doxycycline (1 1g ml’) 


to induce the expression of exogenous mTOR. The cell growth was 
determined as described in Fig. 1d. c-e, MDA-MB-468 cells expressing 
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GEP, wild-type mTOR or different mTOR variants were treated with 
different concentrations of rapamycin (c), AZD8055 (d) or MLN0128 

(e) for 4h. Immunoblot analyses were performed on mTOR effectors. All 
cellular experiments were repeated at least three times. 
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Extended Data Figure 3 | Synthesis of the mTOR bivalent inhibitor 
RapaLink-1. a, Compound design of RapaLink-1, -2, and -3 possessing a 
polyethylene glycol unit of varying lengths. b, Calculated potential energy 
units (U) (kcal mol~') of modelled compounds of varying methylene 
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Extended Data Figure 4 | RapaLink-1 requires FKBP12 for binding 

to the mTOR FRB domain. a, Dose-dependent cell growth inhibition 
curves of the MCF-7 parental cell line treated with rapamycin, MLN0128, 
a combination of rapamycin and MLNO128, or RapaLink-1. The cell 
growth was determined as described in Fig. 1d. b, mTOR-Flag wild type 
and variants were transfected into 293H cells. The mTORC1 complex was 


isolated, and an in vitro competition assay in the presence of FKBP12 was 
performed as described in Fig. 2b. c, MCF-7 cells were treated with either 
DMSO, RapaLink-1 (10nM), FK506 (1041M), or a combination of both 
for 24h, at which time the cells were collected. Immunoblot analyses were 
performed on mTOR signalling. All experiments were repeated at least 
three times. 
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Extended Data Figure 5 | RapaLink-1 is a potent mTOR inhibitor in wild-type and mutant mTOR cells. a~d, MCF-7, RR1, RR2 and TKi-R cells were 
treated with different concentrations of rapamycin (a), MLN0128 (b), combination treatment (c) or RapaLink-1 (d) over 3 days. The cell growth was 
determined as described in Fig. 1d. Each dot and error bar on the curves represents mean + s.d. (n = 8). 
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Extended Data Figure 6 | RapaLink-1 has a prolonged intracellular 
half-life in wild-type mTOR cells. a, MCF-7 F2039S cells were treated 
with different concentrations of rapamycin, MLN0128, combination 
treatment or RapaLink-1 for 4h, at which time the cells were collected. 
Immunoblot analyses were performed on mTOR signalling. b, MCF-7 cells 
were treated for 4h with either DMSO control, 30 nM of rapamycin, 30 nM 
of MLN0128, a combination of 30 nM of both or 30 nM of RapaLink-1 

for 4h, at which time the treatments were washed out three times with 

PBS and fresh media was re-added for the indicated times. Immunoblot 
analyses were performed on mTOR effectors. c, MCF-7 cells were 


10 20 30 40 
Time (days) 


treated with 10nM of RapaLink-1 and collected at the indicated times. 
Immunoblot analyses were performed as described earlier. All experiments 
were repeated at least three times. d, Mice bearing MCF-7 xenograft 
tumours were treated with one single dose of vehicle or RapaLink-1 

(1.5 mg kg~'), tumours were collected at different days after treatment as 
indicated. Immunoblot analyses were performed on mTOR effectors. 

e, The weight of the mice treated in the efficacy study shown in f is 
reported here. f, Mice bearing MCF-7 xenograft tumours were treated as 
described in Fig. 4c (n= 5 for each group). The results were reported as 
percentage tumour volume = s.d. 
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Extended Data Figure 7 | RapaLink-1 is a more potent mTOR inhibitor 
than rapamycin. a, MCF-7 cells were treated for 4h with either RapaLink-1 
(10 nM) or rapamycin (10 nM) with simultaneous addition of increasing 
doses of either rapamycin (left) or RapaLink-1 (right). Immunoblot 
analyses were performed on mTOR effectors. b, c, Mice bearing RR1 

(b) or TKi-R (c) xenograft tumours were treated for 24h with a single 
dose of either vehicle, rapamycin (10 mg kg~'), AZD8055 (75 mg kg“) 
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or RapaLink-1 (1.5 mg kg~!) (n=4 for each group). Immunoblot 
analyses were performed on mTOR effectors. d, MDA-MB-468 cells 
inducibly expressing mTOR wild type were treated with either rapamycin, 
MLNO0128, a combination of rapamycin and MLN0128, or RapaLink-1 
for 4h. Immunoblot analyses were performed on mTOR effectors with 
the indicated antibodies. Rapamycin and MLN0128 panels are the same 
shown for wild type in Extended Data Fig. 2c and e, respectively. 
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Extended Data Table 1 | mTOR mutations found in human patient samples 


TCGA-B0-4852-01 ecRCC (TCGA) E2033V FRB Baseline 
Wagle et al. 2014 Thyroid F2108L FRB Everolimus 
TCGA-DU-6393-01 Glioma (TCGA) M23271 Kinase Baseline 


TCGA-A3-3347-01 ecRCC (TCGA) M23271 Kinase Baseline 


Colorectal Cancer 7 
P-00006559-T01-IMS (MSKCC-IMPACT) M23271 Kinase Baseline 


Bladder Urothelial 
P-0000645-T01-IM3 Carcinoma M23271 Kinase Baseline 
(MSKCC-IMPACT) 


cecRCC_28 ecRCC (U Tokyo) M23271 Kinase Baseline 


Endometrial Cancer = 5 
P-0000614-T01-IM3 (MSKCC-IMPACT) M2327V Kinase Baseline 


List of some FRB and kinase domain mTOR mutations found in human patient samples. Data were collected 
from the cBioPortal, Memorial Sloan-Kettering Cancer Center (MSKCC). 
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Extended Data Table 2 | List of FRB domain mutations found in human patient samples 
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Data were collected from the cBioPortal, MSKCC. 
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Extended Data Table 3 | List of mTOR kinase domain mutations found in human patient samples 
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Ribosome-dependent activation of stringent control 


Alan Brown"*, Israel S. Fernandez!}*, Yuliya Gordiyenko! & V. Ramakrishnan! 


In order to survive, bacteria continually sense, and respond to, 
environmental fluctuations. Stringent control represents a key 
bacterial stress response to nutrient starvation’ that leads to 
rapid and comprehensive reprogramming of metabolic and 
transcriptional patterns’. In general, transcription of genes 
for growth and proliferation is downregulated, while those 
important for survival and virulence are upregulated*. Amino 
acid starvation is sensed by depletion of the aminoacylated tRNA 
pools, and this results in accumulation of ribosomes stalled with 
non-aminoacylated (uncharged) tRNA in the ribosomal A site”, 
RelA is recruited to stalled ribosomes and activated to synthesize a 
hyperphosphorylated guanosine analogue, (p)ppGpp®, which acts as 
a pleiotropic secondary messenger. However, structural information 
about how RelA recognizes stalled ribosomes and discriminates 
against aminoacylated tRNAs is missing. Here we present the 
cryo-electron microscopy structure of RelA bound to the bacterial 
ribosome stalled with uncharged tRNA. The structure reveals that 
RelA utilizes a distinct binding site compared to the translational 
factors, with a multi-domain architecture that wraps around a highly 
distorted A-site tRNA. The TGS (ThrRS, GTPase and SpoT) domain 
of RelA binds the CCA tail to orient the free 3’ hydroxyl group of the 
terminal adenosine towards a 3-strand, such that an aminoacylated 
tRNA at this position would be sterically precluded. The structure 


Figure 1 | Structure of RelA bound to the ribosome. a, Overall view 
of RelA in complex with a ribosome stalled with an uncharged tRNA in 
the A site. Displayed are the 50S and 30S ribosomal subunits; E-, P- and 
A-site tRNAs; mRNA, and RelA coloured by domain. b, Structure of the 
ribosome-bound form of RelA oriented from N to C termini with the 
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supports a model in which association of RelA with the ribosome 
suppresses auto-inhibition to activate synthesis of (p)ppGpp and 
initiate the stringent response. Since stringent control is responsible 
for the survival of pathogenic bacteria under stress conditions, and 
contributes to chronic infections and antibiotic tolerance, RelA 
represents a good target for the development of novel antibacterial 
therapeutics. 

Stringent control is a pleiotropic response to the failure of amino 
acid availability to keep up with the demands of protein synthesis". 
It is mediated by a hyperphosphorylated nucleotide ((p)ppGpp)”””. 
In E. coli, the synthesis of (p)ppGpp is catalysed by RelA’, a multi- 
domain ATP:GTP(GDP) pyrophosphate transferase, and a prototypic 
member of the RelA/SpoT homologue (RSH) family!!. The majority 
of multi-domain RSH proteins are stimulated to generate (p)ppGpp 
in a ribosome-dependent manner when an uncharged and cognate 
tRNA, which acts as a marker for nutrient deficiency, is located in 
the ribosomal A site”®. Discrimination against aminoacylated tRNA 
prevents undesired activation of stringent control during the normal 
translation cycle. 

Using cryo-electron microscopy we have solved the structure of the 
E. coli ribosome, programmed so that uncharged tRNA(Phe) occupies 
the A site, in complex with RelA at an overall resolution of 3.0 A (Fig. 1, 
Extended Data Figs 1 and 2, and Extended Data Table 1). We did not 
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domain organization below showing the boundaries of the hydrolase 
(HYD), synthetase (SYN), TGS, zinc-finger (ZFD) and RNA recognition 
motif (RRM) domains. Unmodelled flexible elements that connect RelA 
domains are indicated with dashed lines. 
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observe any class in which RelA was bound to the ribosome in the 
absence of A-site tRNA. Both RelA and the A-site tRNA remain flexible 
when bound to the ribosome, primarily due to binding intrinsically 
flexible rRNA elements (notably the L7/L12 stalk base and the A-site 
finger). Although there are only minor differences in conformations 
(Extended Data Fig. 1b), the heterogeneity was sufficient to result 
in RelA having less well-resolved density than the ribosome. To 
distinguish conformational states and improve the local map quality 
we used a recent modification of the 3D classification process”, 
in which ribosome projections were subtracted from each experimental 
particle, leaving signal only for RelA before classification focused on 
each domain (Methods and Extended Data Fig. 1). This improved the 
density for the RelA domains (Extended Data Figs 2 and 3) allowing 
models to be built (Extended Data Table 2). 

The structure reveals that RelA forms a highly extended confor- 
mation on the ribosome to cradle the uncharged tRNA in a distorted 
conformation in the A site (Fig. 1). RelA has an N-terminal region 
formed by hydrolase (HYD), synthetase (SYN), and TGS domains that 
are located at the acceptor end of the A-site tRNA, and a C-terminal 
region formed by a zinc-finger domain and an RNA recognition motif 
(RRM) that run parallel to the anticodon arm of the tRNA. These five 
domains are connected by flexible and helical elements in a serpentine 
configuration that wind between the ribosome and the A-site tRNA 
(Fig. 1 and Extended Data Fig. 4). In this conformation, RelA inhibits 
accommodation of the acceptor arm of the uncharged tRNA into the 
peptidyl transferase centre. 

As has been previously noted’, the overall conformation of the 
A-site tRNA in the presence of RelA resembles the A/T state adopted by 
pre-accommodated aminoacyl-tRNA in complex with EF-Tu’ (Fig. 2a). 
However, our high-resolution map reveals that the interactions between 
the tRNA and the ribosomal large subunit are very different, with the 
tRNA contacting both the RNA component of the L7/L12 stalk base 
(helices 42-44) and the sarcin-ricin loop (SRL; helix 95) (Fig. 2b). 
A stacking interaction between nucleotides C56 of the tRNA elbow 
and A1067 of the L7/L12 stalk base is reminiscent of how the L1 stalk 
(helices 76-78) recognizes E-site tRNA}. The antibiotic thiostrepton 
binds in the vicinity of A1067 and may prevent this interaction, explaining 
its ability to inhibit (p)ppGpp formation®. The contacts with rRNA, and 
also with RelA, distort the tRNA compared to the A/T state (Fig. 2a, b 
and Extended Data Fig. 5). Starting at base-pair 27:43 after the aligned 
anticodon stem-loops (ASLs), a 6° rotation away from the ribosome 
is coordinated with an outward movement of the L7/L12 stalk base. 
A further 11° rotation of the acceptor stem, starting at base-pair 7:66, 
allows the tRNA to contact the SRL, which is bound by EF-Tu in the 
decoding complex". 

Once EF-Tu has dissociated from the ribosome, it is not known 
whether the A-site tRNA conformation could still fluctuate or the 
higher affinity of aminoacylated (compared to uncharged) tRNA for 
the A site in the peptidyl transferase centre'®'” would stabilize the 
accommodated form of tRNA. However, with an uncharged tRNA, the 
fluctuations in its conformation could bring the acceptor end into 
contact with the RelA TGS domain and stabilize a distorted form in a 
manner that discriminates against aminoacylated tRNAs (Fig. 2c). This 
small domain has a 3-grasp fold similar to that found in the ubiquitin 
family. In RelA, an additional pair of C-terminal a-helices, the second 
of which spans across the axis of the acceptor stem, extend the fold 
(TGS a3 and a4, see Extended Data Fig. 6). Highly conserved basic 
residues (Arg487, Lys491, His493, Arg497, and Lys498) at each end 
of this helix form electrostatic interactions with the tRNA phosphate 
backbone (Fig. 2c and Extended Data Fig. 7). The absence of base- 
specific contacts allows RelA to recognize all tRNAs equally. 
The 3’ CCA of the A-site tRNA extends ~14A around the outside of 
the TGS domain (Fig. 2d) and is maintained by a series of interac- 
tions; C74 stacks with His432, C75 can form hydrogen bonds with 
Arg438, and crucially A76 stacks beneath Pro411 and Lys412 (Fig. 2e 
and Extended Data Fig. 3a-c). This positions the free 3’ hydroxyl group 
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Figure 2 | Molecular basis for the recognition of uncharged A-site 
tRNA. a, Comparison of A-site tRNAs. In the presence of RelA, A-site 
tRNA (purple) structurally resembles the A/T state adopted by pre- 
accommodated aminoacyl-tRNA in complex with EF-Tu (light grey). 
RelA prevents accommodation of tRNA into the canonical A-site position 
(brown). P-site tRNA (green) and mRNA (dark grey) are shown for 
reference. b, Conformational differences between RelA-bound A-site 
tRNA (purple) and A/T-tRNA (light grey) result from an outward position 
of the L7/L12 stalk base and movement of the tRNA to contact the SRL 
(both light blue). c, The TGS domain (teal) binds the acceptor end of the 
A-site tRNA (purple) through a positively-charged helix that interacts with 
the phosphate backbone (expanded view). d, When viewed from the back 
as indicated in c, the 3’ CCA (nucleotides 74-76) of the A-site tRNA wraps 
around the surface of the TGS domain. e, The conformation of the CCA is 
maintained by interactions with invariant residues of the TGS domain. 

f, The free 3’ OH of the terminal adenine is positioned to face the 
85-strand of the TGS domain to sterically preclude the binding of 
aminoacylated tRNAs. 


of the terminal adenosine towards the 85-strand of the TGS domain 
(Fig. 2e, f). An aminoacylated tRNA at this position would be sterically 
precluded (Extended Data Fig. 3b). This agrees with data that shows a 
free 3’ hydroxyl group is a prerequisite of RelA activation’®. 

As well as binding the A-site tRNA, the TGS domain contacts the 
small subunit rRNA and the ribosomal proteins uS12 and uL14. 
Although ribosomes that lack, or have mutant, uL11 have impaired 
(p)ppGpp synthesis!?”°, we do not observe a direct interaction between 
RelA and uL11. 

The TGS domain together with the N-terminal hydrolase and syn- 
thetase domains, defines the minimal unit found in ribosome-dependent 
RSHs'!. In RelA, the two C-terminal domains act in tandem to fur- 
ther anchor RelA to the ribosome by binding the ribosomal A-site 
finger (ASF; 23S rRNA helix 38) that spans the inter-subunit space 
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Figure 3 | Interactions between RelA and the ribosome. a, Overview 
(left) and details (right) of the interaction between the zinc-finger domain 
(ZED) (orange) and RRM (blue) of RelA and the ribosomal ASF (light 
blue) that spans the inter-subunit interface between the P-site (green) and 
A-site (purple) tRNAs. RelA acts as an additional inter-subunit bridge by 
binding uL16 (cyan) in the large subunit and uS19 (yellow) in the small 
subunit. PTC, peptidyl transferase centre. b, The a-helix of the ZFD binds 
in the major groove of the ASF, with the zinc-binding site interacting with 
the phosphate backbone. c, By binding to uS19, the ZFD occupies the 
position adopted by the ASF in the rotated ribosome (grey). 


(Fig. 3a and Extended Data Fig. 7). Although the C terminus was known 
to be involved in binding to the ribosome*!””, the nature of the interac- 
tion is unexpected and was not previously noted in the low-resolution 
reconstruction!’. To our knowledge, this represents a novel binding site 
on the ribosome for an extrinsic factor. 

RelA interacts with the tip of the ASF through an unpredicted, and 
unusual, CCHC-type zinc-finger domain. This domain has a 338008 
topology (Extended Data Fig. 6), with a single zinc ion co-ordinated 
at the N terminus of the «1-helix (Fig. 3b). The zinc-binding residues, 
Cys612 and His634, directly contribute to binding the ASF by inter- 
acting with the phosphate backbone of nucleotide U884 and help to 
orient the c1-helix in the major groove of the ASF. The ASF usually 
forms a dynamic bridge (B1a) with either uS13 or uS19 depending on 
the ratcheted state of the ribosome”’. RelA stabilizes the bridge with 
uS13 by binding to the same interface of uS19 as occupied by the ASF 
in the rotated ribosome, which results in a better-defined ASF than in 
any previous ribosome structure (Fig. 3c). 

The C-terminal domain is sandwiched between the upper part of the 
ASF and the A-site tRNA, and contacts the large subunit protein, uL16 
(Fig. 3a). Although this domain had been predicted as an ACT fold!!, 
the interaction between the face of its four-stranded antiparallel 3-sheet 
and the ASF is more reminiscent of an RNA recognition motif (RRM), 
which shares the same topology (Extended Data Fig. 8). Furthermore, 
conjoint RRM/zinc-finger domains are common in eukaryotes”. 

In contrast to the C-terminal domains, the hydrolase and catalytic 
synthetase domains form few contacts with the ribosome. The domains 
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are extremely flexible, and occupy multiple positions between the SRL 
and the spur of the small subunit (18S rRNA helix 6). Using our focused 
classification approach we isolated a few well-populated states that were 
sufficiently resolved to orient a model based on a crystal structure of a 
homologue” (Extended Data Fig. 1a). Although the synthetase domain 
can, in some classes, contact the tip of the spur (nucleotides 84-88), 
the general absence of interactions with the ribosome, suggests that 
RelA activation is indirect. Rather, as the isolated catalytic domain of 
RelA is constitutively active in a ribosome-independent manner, 
our structure supports a model where association with the ribosome 
and uncharged A-site tRNA suppresses regulatory auto-inhibition 
that results from RelA homodimers or oligomers”), as previously 
suggested’’, This contrasts with activation of the translational GTPases 
in which the SRL has a direct role in inducing catalysis*®. Our structure 
should provide a framework for the design of experiments to differen- 
tiate among the various proposed mechanisms for RelA?”?””?. 

In conclusion, our structural data reveal how RelA specifically 
recognizes ribosomes stalled under conditions of amino acid starvation 
to activate synthesis of (p)ppGpp and initiate the stringent response. By 
using the ribosome as a signalling platform, RelA provides an immedi- 
ate link between the status of translation and global adaptation to the 
environment. As the distribution of the RSH family is strictly limited 
to bacteria, and stringent control contributes to the virulence, persis- 
tence, and antibiotic tolerance of bacterial infections, the structure 
can provide a framework for the development of therapeutics that can 
selectively inactivate stringent control and re-sensitize resilient bacteria 
to antibiotics*°. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

RelA purification. The full-length E. coli RelA protein containing an in-frame 
N-terminal octahistidine tag and the recognition signal for tobacco etch virus 
(TEV) protease was recombinantly expressed in E. coli BL21 cells, as previously 
described". In a modification to the reported procedure, the protein was expressed 
for 3h at 37°C after induction. Cells were collected by centrifugation at 4°C and 
lysed by sonication in ice-cold lysis buffer (50mM Tris pH 7.5, 1M KCl, 1mM 
MgCh, 15 mM imidazole, 2mM DTT) containing 1 x protease inhibitor cocktail 
(Roche). After centrifugation, the supernatant of the lysate was applied to a 5 ml 
HisTrap HP column (GE Healthcare) equilibrated in lysis buffer. After copious 
washing, a linear gradient of lysis buffer supplemented with 450 mM imidazole 
was used to elute the histidine-tagged RelA protein. RelA was dialysed against 
21 of buffer A (20mM Tris pH 7.5, 0.6 M KCI, 1mM MgCl and 2mM DTT) at 4°C 
for 2h in the presence of TEV protease in a 1:200 mass ratio with RelA. A passage 
through a 5 ml HisTrap HP column equilibrated in buffer A was performed in a 
gradient of 0-450 mM imidazole to remove TEV protease and cleaved His tag. 
The early-eluting peak corresponding to digested RelA was used immediately to 
prepare complexes for freezing on cryo-EM grids to complete the procedure within 
a single day. 

Sample preparation for electron microscopy. Ribosomes were purified from 50 g 
of E. coli MRE 600 cells grown to mid-log phase. Cells were disrupted by sonica- 
tion in buffer R200 (20mM Tris-HCl pH 7.5, 200 mM NH,Cl, 10mM Mg(OAc)2, 
6mM 6-mercaptoethanol, 0.1 mM benzamidine, and 0.1 mM PMSF). Cell debris 
was removed by two rounds of centrifugation at 30,000g for 30 min. Ribosomes 
were pelleted through a 20 ml 1.1 M sucrose cushion using a 45 Ti rotor (Beckman 
Coulter) at 205,000 for 18h. This process was repeated twice in buffer R200, and 
then once in buffer R500 (as R200, but with 500 mM NH,Cl). Between pelleting 
steps, pellets were dissolved in the appropriate buffer by gentle shaking for 2h on 
ice. The final pellet was dissolved in buffer R60 (20 mM Tris pH 7.5, 60 mM NH,Cl, 
10mM Mg(OAc)2, 6mM 8-mercaptoethanol). 

To remove non-ribosomal material and isolated subunits, ~208 OD2¢9 of sample 
were loaded on each of six 15-30% sucrose gradients in buffer R60 and centrifuged 
at 58,000g for 18h with a SW28 rotor (Beckman Coulter). The peak fractions 
corresponding to 70S ribosomes were pooled and diluted to 180 ml with buffer 
G (5mM HEPES pH 7.5, 50mM KCl, 10mM NH,Cl, 10mM Mg(OAc)2, 6mM 
8-mercaptoethanol), before a further pelleting step at 205,000g for 18h. After 
washing with buffer G, the pellets were resuspended to a final concentration of 
~6.8|.M in buffer G. Aliquots of 2011 were flash-cooled in liquid nitrogen and 
stored at —80°C. 

Aminoacylated fMet-tRNA™* and uncharged tRNA?" were produced as previ- 
ously reported’. A modified version of Z46C mRNA" was chemically synthesized 
(GE Dharmacon) to include six codons for Phe (UUC) after the initiator fMet 
codon (AUG). 

Complexes were formed by incubating 100 nM ribosome with the step-wise addi- 
tion of 111M mRNA in buffer G100 (5mM K-HEPES pH 7.0, 100mM KCl, 10mM 
NH,Cl, 10 mM Mg(OAc)3, 6mM {-mercaptoethanol), 200 nM fMet-tRNA™€t, 
and 800nM tRNA?"*, The complexes were incubated for 5 min at 37°C between 
each addition. Separately, 500 nM RelA was pre-incubated with 1 .M AMPCPP 
and 141M GDP for 10 min at 37°C, and then added to the ribosome complex to a 
final volume of 100. To stabilize the codon-anticodon interaction of the A-site 
tRNA, 111M paromomycin was added. After a further 5 min incubation, the sample 
was cooled to 4°C and used immediately to prepare grids for electron microscopy. 
Electron microscopy. Aliquots of 3 1 of the RelA complex were incubated for 20s 
on glow-discharged holey carbon grids (Quantifoil R2/2), on which a ~30 A-thick 
custom-made amorphous carbon film had previously been deposited. The grids 
were blotted for 5s in 100% humidity at 4°C before being flash-cooled in liquid 
ethane using a Vitrobot MKII (FEI). 

Grids were transferred to a Polara G2 microscope (FEI) operated at 300 kV. 
Images were recorded with the EPU automated data acquisition software on a 
Falcon III direct electron detector (FEI) at a calibrated magnification of 104,478 
(yielding a pixel size of 1.34 A). Images were collected with a total dose of 
35e per A? and a defocus range of —1.8 to —3.0,1m. A bespoke system was 
used to intercept the videos from the detector at a speed of 30 frames for the 
1.1-s exposures. 

Image processing. All micrographs that showed signs of astigmatism, substantial 
drift, or poor ice were discarded. The frames of the remaining micrographs were 
aligned using whole-image motion correction*! to reduce beam-induced blurring. 
Parameters of the contrast transfer function for each motion-corrected micrograph 
were obtained using Gctf**. The interactive semi-automatic swarm tool in the 
e2boxer.py program of EMAN2*? was used to select particles from the images. 
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We used reference-free two-dimensional class averaging in RELION* to discard 
non-ribosomal particles and particles of isolated subunits. After two-dimensional 
classification, Euler angles for each particle were assigned using three-dimensional 
refinement in RELION™ using a 60 A low-pass-filtered reconstruction of 
EMDB-2373' as an initial reference. 

This resulted in a consensus reconstruction in which density for tRNAs and 

RelA were apparent, but at lower occupancy than the ribosome. To enrich for 
particles containing RelA, we employed focused classification with signal subtrac- 
tion (FCwSS)"” using a mask over the A-site tRNA and RelA. This yielded a class in 
which the A site was fully occupied. The movements of particles within this class 
were corrected in RELION and the contributions of each frame weighted using a 
resolution-dependent radiation damage model*° to generate ‘polished particles. 
Polished particles from different data sets were commingled and further classified 
and refined to yield a final data set of 164,353 particles with a nominal resolution 
of 3.0 A (Extended Data Table 1). Owing to on-ribosome conformational hetero- 
geneity, subsequent rounds of FCwSS were used to isolate different states of RelA. 
Prior to visualization, all density maps were corrected for the modulation transfer 
function (MTF), and then sharpened by applying a B-factor that was estimated 
using automated procedures**. Local resolution was quantified with ResMap*” 
(Extended Data Fig. 2). 
Model building. The reconstruction was initially interpreted by docking the 
high-resolution crystal structure of the Escherichia coli 70S ribosome (PDB acces- 
sion code 4YBB**) into the map using Chimera*’. Models for uL10, bL9, and 
bL31 were taken from the cryo-EM reconstruction of E. coli 70S in complex with 
EF-Tu (PDB accession code 5AFI”°). Initial models for the tRNAs and mRNA 
were taken from the Thermus thermophilus 70S in complex with EF-Tu (PDB 
accession code 4V5L"'). Density for paromomycin was clearly discernible within 
rRNA helix 44 of the small subunit, and fitted with a model. Unlike previous 
crystal structures”, we do not observe a second paromomycin binding site within 
the large subunit. 

Homology models generated using I-TASSER* were used to guide model 

building for E. coli RelA. A model for the hydrolase and synthetase domains was 
obtained from the crystal structure of the bifunctional catalytic domain from a 
Streptococcus equisimilis RSH”? (PDB accession code 1VJ7). A model for the RRM 
was generated using the structure of the RRM domain from a Chlorobium tepidum 
RSH (PDB accession code 3IBW). A model for the TGS domain was derived from 
the crystal structure of the TGS domain from a Clostridium leptum RSH (PDB 
accession code 3HVZ). The ZFD and connecting elements were built de novo 
(Extended Data Table 2). The fit of all models to the map was optimized using real 
space refinement in Coot. 
Model refinement and validation. Reciprocal space refinement was carried out 
in REFMAC v5.8 optimized for EM maps using external restraints generated by 
ProSMART and LIBG“™. Fourier shell correlation (FSC)average Was monitored during 
refinement and the final model was validated using MolProbity* (Extended 
Data Table 1). Cross-validation against over-fitting was performed as previously 
described****. Figures were generated using PYMOL”, Chimera”, or Coot/ 
Raster3D*, 
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Extended Data Figure 1 | In silico 3D classification scheme. a, All 
particles were subjected to 2D classification, from which non-ribosomal 
particles were discarded, before 3D refinement. To isolate particles 
containing A-site tRNA and RelA, 3D classification focused on occupancy 
of the ribosomal A site was performed. Refinement of these 183,615 
particles resulted in a reconstruction with a nominal resolution of 2.9 A. 

A second round of 3D classification isolated 164,353 well-aligned particles. 
Conformational heterogeneity of the ribosome was resolved by 3D 
classification without alignment, which identified two dominant classes in 
which the body of the small subunit occupies different positions (indicated 
with an arrow). Class 1 was used as the reference for model building, 
refinement, and interpretation. To resolve additional conformational 


Isolation of domain conformations 
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@ 3D Classification 
®@ Classification without alignment 
@ Focused classification with signal subtraction 


(_) Number of particles in each class / resolution 


HYD/SYN 
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Class 2 


heterogeneity of RelA, focused classification with signal subtraction 
(FCwSS) was performed on each domain, with the hydrolase (HYD) and 
synthetase (SYN) domains treated as a single unit. For the RRM, zinc- 
finger and TGS domains a single class was isolated in which the density 
was better resolved than in the reference class. The overall resolution of 
the reconstructions are reported according to the Fourier shell correlation 
(FSC) = 0.143 criterion. Multiple conformations of the HYD and SYN 
domains were identified, with the four best-resolved classes shown. 
Together these account for 42% of the particles. b, The two main classes 
for the TGS domain provide an example of the small conformational 
differences that were isolated using FCwSS. 
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Extended Data Figure 2 | Quality of maps and models. a, FSC curve 

for the EM map. b, The unfiltered and unsharpened density map, in both 
surface and slice view, coloured by local resolution. c, Fit of models to 
maps. FSC curves calculated between the refined model and the final 

map (black), with the self- and cross-validated correlations in blue and 
magenta, respectively. Information beyond 3.0 A was not used during 
refinement and preserved for validation. d, e, Examples of high-resolution 
features of the map. d, Density for selected rRNA modifications and 
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Local resolution (A) 


paromomycin. e, Density for the codon-anticodon interaction in the 

A site. e, Unfiltered and unsharpened map of RelA bound to the bacterial 
ribosome, showing the ribosome-binding RelA domains coloured by local 
resolution according to the FCwSS maps (see Extended Data Fig. 1). 

The regions amplified in panels f and g are highlighted. f, The RelA ZFD 
and RRM coloured by local resolution. g, The TGS domain coloured by 
local resolution. 
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Extended Data Figure 3 | Examples of RelA density. a, Density for the with His432 and C75 can potentially interact with Arg438 of the TGS 
interaction between the 3’ CCA of the A/T-tRNA and the TGS domain. domain. d, Density for helix a4 of the TGS domain. e, Density for TGS a3. 
b, A modelled tRNA“ demonstrates that even the smallest aminoacyl f, Density for the interaction between the ZFD and uS19, showing 

groups would clash with RelA. The sphere size of the atoms of the distinctive density for two consecutive histidine residues. g, Example of 
aminoacyl group corresponds to their van der Waals radii. c, C74 stacks side chain density used for the de novo building of the ZFD. 
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Extended Data Figure 4 | RelA domains are connected by flexible suggested by the weak and broken density. The figure shows the unfiltered, 
linkers. Two related views showing the density that connects the RelA unsharpened density map for the ribosomal small subunit (SSU) of class 4 
HYD, SYN and TGS domains with the ZFD/RRM. The linker runs (Extended Data Fig. 1) with the large subunit removed for clarity. 


between the A/T-tRNA and the ribosome, but remains flexible as 
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Extended Data Figure 5 | The conformation of uncharged A-site tRNA distorted so that the tRNA elbow regions are separated by a 6° rotation. 
in the presence of RelA is distinct from aminoacylated A/T-tRNA inthe _ b, A second 11° rotation occurs at base-pair 7:66 of the acceptor stem 
presence of EF-Tu. a, The ASLs of A-site tRNA (purple) and A/T-tRNA so that the A-site tRNA in the presence of RelA is closer to the 

(grey) superpose until base-pair 27:43. At this point, the A-site tRNA is ribosomal SRL. 
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Extended Data Figure 6 | RelA topology diagram. Secondary structure elements for RelA residues 404-740 are numbered separately for each domain. 
Unbuilt sections are shown as dashed lines. Topologies were extracted using Pro-origami”’. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


+4 


+2 


A-site finger 
(ASF) 


ZFD 


jo) 
Electrostatic potential (kT e,") 


Extended Data Figure 7 | RelA binds RNA through electropositive with the TGS domain in surface representation coloured by electrostatic 
surfaces. a, The ZFD and RRM of RelA act together to recognize the potential. Electrostatic potentials were calculated using APBS*’, where k is 
ASF of the LSU rRNA. b, As in a, but with the ZFD and RRM shown in Boltzmann's constant, T is the temperature of the calculation (310 K) and 
surface representation coloured by electrostatic potential. c, The RelA e, is the charge of an electron. 


TGS domain binds the acceptor arm of the A/T-tRNA. d, As in c, but 
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Extended Data Figure 8 | RelA contains an RNA recognition motif of RNA molecules, but share a common fold and a similar protein-RNA 
(RRM). a, The RRM from RelA binds the ASF (nucleotides 894-899 interface, for example in the interaction between PRP24 and U6 small 
shown) through the face of the 8-sheet. b, RRMs recognize a wide variety nuclear RNA (PDB accession code 4NOT). 
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Extended Data Table 1 | Refinement and model statistics 


Data Collection 
Particles 


Pixel size (A) 
Defocus mean (um) 
Defocus range (um) 
Voltage (kV) 
Electron dose (e- A) 
Model composition 
Non-hydrogen atoms 
Protein residues 
RNA bases 
Ligands (Zn2+/Mg?+/H20/PAR) 
Refinement 
Resolution (A) 
Map sharpening B-factor (A2) 
Average B factor (A2) 
FSCaverage 
Rms deviations 
Bond lengths (A) 
Bond angles (°) 
Validation (proteins) 
Molprobity score 
Clashscore, all atoms 
Good rotamers (%) 
Ramachandran plot 
Favored (%) 
Outliers (%) 
Validation (RNA) 
Correct sugar puckers (%) 
Good backbone conformations (%) 
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98,498 
1.34 

2.0 

0.5 - 3.9 
300 

35 


151,800 
6,229 
4,782 
3/312/2/1 


3.0 


0.005 
1.1 


2.5 (95t) 
6.5 (100°) 
94.2 


91.3 
1.3 


99.0 
78.4 


The RelA HYD/SYN domain (residues 15-353) was modelled as a rigid-body fitted homology model and excluded from refinement. 
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Extended Data Table 2 | Modelled residues 


Residues Domain Model details 

15-353 HYD/SYN Rigid body fitted homology model based on PDB 
accession code 1VJ7. 

404-501 TGS Model based on PDB accession code 3HVZ adjusted 
to fit the density. 

539-549 Linker Modeled as a poly(alanine) helix. 

553-571 Linker Modeled as a poly(alanine) helix. 

594-663 ZFD Built de novo. 

664-740 RRM Model based on PDB accession code 3IBW adjusted 
to fit the density. 
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Charge-density analysis of an iron-sulfur protein 
at an ultra-high resolution of 0.48 A 


Yu Hirano!+, Kazuki Takeda! & Kunio Miki! 


The fine structures of proteins, such as the positions of hydrogen 
atoms, distributions of valence electrons and orientations of bound 
waters, are critical factors for determining the dynamic and chemical 
properties of proteins. Such information cannot be obtained by 
conventional protein X-ray analyses at 3.0-1.5 A resolution, in which 
amino acids are fitted into atomically unresolved electron-density 
maps and refinement calculations are performed under strong 
restraints!?. Therefore, we usually supplement the information 
on hydrogen atoms and valence electrons in proteins with pre- 
existing common knowledge obtained by chemistry in small 
molecules. However, even now, computational calculation of such 
information with quantum chemistry also tends to be difficult, 
especially for polynuclear metalloproteins®. Here we report a 
charge-density analysis of the high-potential iron-sulfur protein 
from the thermophilic purple bacterium Thermochromatium 
tepidum using X-ray data at an ultra-high resolution of 0.48 A. 
Residual electron densities in the conventional refinement are 
assigned as valence electrons in the multipolar refinement. Iron 3d 
and sulfur 3p electron densities of the Fe4S, cluster are visualized 
around the atoms. Such information provides the most detailed view 
of the valence electrons of the metal complex in the protein. The 
asymmetry of the iron-sulfur cluster and the protein environment 
suggests the structural basis of charge storing on electron transfer. 
Our charge-density analysis reveals many fine features around the 
metal complex for the first time, and will enable further theoretical 
and experimental studies of metalloproteins. 

Fine structural information, including the positions of hydrogen 
atoms and distributions of valence electrons, is essential for under- 
standing the full properties of protein molecules; in particular, of 
electron transfer metalloproteins in photosynthesis and respiration. 
However, it is difficult to obtain such information with theoretical 
and experimental analyses. Therefore, experimental visualization of 
valence electrons and hydrogen atoms is of great value in elucidating 
the molecular mechanisms of protein functions. We have previously 
performed crystallographic investigations of photosynthetic proteins 
from T. tepidum**. The high-potential iron-sulfur protein (HiPIP) is 
critical for electron transfer in bacterial photosynthesis. HiPIP from 
T. tepidum consists of 83 residues and possesses one Fe,S, cluster at the 
centre of the protein. The protein gives high-quality crystals suitable 
for X-ray crystallographic analysis at ultra-high resolution’. It has been 
reported that the redox properties of the Fe,S, cluster are attributable 
to interactions between the cluster and surrounding ligands, both in 
Fe,S, and iron-sulfur proteins®” as well as in analogue compounds!” 
Therefore, we performed an experimental charge-density analysis of 
HiPIP in its reduced state. This is the first case, to our knowledge, of 
charge-density analysis being applied to metalloproteins, although it 
has been used previously for several proteins'!"!*, 

A diffraction data set at 0.48 A resolution was collected using 
high-energy synchrotron X-rays (A=0.45 A) (Extended Data Table 1 


and Extended Data Fig. 1). This is equivalent to one of the highest- 
resolution data sets in the Protein Data Bank (http://www.rcsb.org/pdb/ 
home/home.do), and it allowed us to perform structure refinement 
using the conventional independent spherical atom model (ISAM) 
without geometric restraints. Even after ISAM refinement, however, 
many residual electron densities were observed around each atom 
(Extended Data Fig. 2). Peaks of residual electron density were on the 
covalent bonds and around the carbonyl oxygen atoms of peptide bonds 
(Extended Data Fig. 2a), as well as on the covalent bonds of aromatic 
rings (Extended Data Fig. 2b). In the Fe4S, cluster, peaks were sym- 
metrically distributed around the Fe atoms (Extended Data Fig. 2c). 
The charge-density information of the electron densities was analysed 
with a multipolar atom model (MAM)*. The R factor was reduced 
from 8.24% (Rfree = 8.63%) to 7.16% (Riree = 7.80%) by applying the 
charge-density analysis. The final structure contained hydrogen atoms 
of all 83 residues. In addition, 42 hydrogen atoms of water molecules 
were also included (Extended Data Table 1). The deformation map® 
reveals the distribution of valence electrons in the protein. The fine 
structural information both of hydrogen atoms and of valence elec- 
trons highlights detailed views of intra-molecular interactions in the 
protein. 

The dihedral angle w of C,-C-N’-C,’ in a peptide bond, where 
prime symbols represent the atoms in the next residue, defines the 
planarity of the peptide bond, and the planar trans peptide shows 
an angle w of 180°. Although some non-planar peptide bonds have 
been observed in protein crystal structures determined at ultra-high 
resolutions’®, a planar geometry is adopted in the protein structure 
at ordinary resolutions!’. In HiPIP, the peptides deviating approxi- 
mately 10° or more from the planar w angle are mainly located around 
the cysteine residues that are covalently bound to the Fe atoms of 
the Fe4S, cluster (Fig. 1A). The distortion of peptide bonds is also 
observed in the positions of amide hydrogen atoms (Fig. 1B). Almost 
all hydrogen atoms deviating from the C-N’-C,’ plane are in the 
non-planar peptide bonds (Extended Data Table 2). These hydrogen 
atoms are concentrated in the proximal region of the Fe4S, cluster. 
The deviation of hydrogen atoms is plausibly influenced by the hydro- 
gen bonds they form, in which the donor hydrogen atoms are pointed 
towards the lone pair electron density of the acceptor atoms (Fig. 1B). 
The same features have also been observed for the donor hydrogen 
atoms in the crystal structures of small molecules’®. Three (Cys43, 
Cys61 and Cys75) out of four cysteine residues have distorted pep- 
tide bonds, while Cys46 does not (Extended Data Table 2). NMR 
data suggest that o-type delocalization of Fe orbitals occurs through 
Fe-(Cys-S.) bonds'®. This may further affect distortion of the peptide 
bonds. Indeed, Cys46 has the longest Fe-(Cys-S,) bond (Extended 
Data Table 3). The non-planarity causes partial breakdown of the 
resonance of the peptide bonds”*”!. 

Non-spherical distribution of electron density is clearly observed 
around the atoms of Fe4S4(Cys-S.)4 (Fig. 2a). Lobes of electron density 
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Figure 1 | Crystal structure of HiPIP at 0.48 A resolution. A, Overall 
structure of HiPIP. Non-hydrogen atoms of aromatic side-chains, non- 
hydrogen atoms of cysteine residues and atoms of the Fe,S, cluster are 
depicted as stick models. The main-chain atoms are shown as a tube model 
and single conformational residues are coloured according to the omega 
angles (|w|) from 160° (red) to 180° (white). The main-chain atoms of 
multi-conformational residues are coloured white. The aromatic side- 
chains are coloured dark grey. The Fe and S atoms of Fe4S4(Cys-S,)4 are 
coloured black and light grey, respectively. The orange, green, light blue 
and violet arrows represent the peptide bonds indicated in B. B, Hydrogen 
atoms in the peptide bonds of (a) Asn20, (b) Cys43, (c) Ile69 and (d) 
Ala76. The static deformation maps are shown as grey and cyan surfaces 
contoured at the levels of +0.1 and +0.3 electrons per cubic angstrom, 
respectively. The omit maps of amide hydrogens are shown as a pink mesh 
contoured at the 3.0c level. 


are symmetrically distributed around the Fe atoms, and correspond to 
the distribution of the Fe 3d-orbital electrons. The shapes of these 
electron-density lobes are quite similar among the four Fe atoms. In 
contrast to the Fe atoms, the bridging S and Cys-S. atoms are sur- 
rounded by more diffuse electron density that corresponds to the 
distribution of the S 3p-orbital electrons. The deformation electron 
densities of the individual S atoms do not exhibit any apparent simi- 
larity, unlike in the case of the Fe atoms. In some Fe-S bonds, the Fe 
3d electron density points towards the S 3p electron density (Fig. 2b, 
c). Similar interactions in the valence density have been reported for 
the Ni-N bond in a transition metal complex”. However, the 3d—-3p 
overlaps are smaller for bonds with shorter Fe-S bonds such as FE1-S2, 
while they are larger for longer bonds such as FE1-S3, in general. 

The charge-density analysis presents the atomic charges of Fe,S4(Cys- 
S.)4 in HiPIP (Fig. 2d and Extended Data Table 4). The total atomic 
charge of Fe4S4(Cys-S.)4 is —1.5. This value is close to the total of the 
formal charges of Fe4S4(Cys-S.)4 in the reduced state. The atomic 
charge of FE1 (+0.9) is lower than those of FE2 (+1.2), FE3 (+1.1) 
and FE4 (+1.5). The atomic charge of $4 (—1.6) is significantly lower 
than those of other bridging S atoms (from —0.5 to —0.4). The atomic 
charges of Cys43-S, (—1.5) and Cys75-S. (—1.3) are lower than those of 
Cys46-S.(—0.1) and Cys61-S.(—0.3). The FE] atom is coordinated by 
the ligand sulfur atoms, with the sum of the charges (—4.1) being more 
negative than for the other three Fe atoms (—2.7 to —2.6). 
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Figure 2 | The Fe,S, cluster. a, The three-dimensional representation 

of the static deformation map of Fe,S4(Cys-S.)4. The orientation of the 
cluster is the same as the left panel of Fig. 1A. The isosurface represents 
the electron density contoured at the level of +0.2 electrons per cubic 
angstrom. b, A close-up view for the plane of FE1-S3-FE4-S2. Bond 
lengths are indicated in the figure. c, The static deformation maps of the 
plane consisting of FE1, S3 and Cys43-S. atoms. The contour interval 

is 0.05 electrons per cubic angstrom. Blue solid, red dashed and yellow 
dashed lines denote positive, negative and zero contours, respectively. 

d, Schematic representation of Fe4S4(Cys-S.)4. Left, the plane of FE1-S4- 
FE2-S3; right, the plane of FE3-S1-FE4-S2. The atomic charges for each 
atom are indicated. The dashed lines indicate interactions between valence 
densities of sulfur atoms and hydrogen atoms, and the thin lines indicate 
interactions between valence densities of sulfur atoms and non-hydrogen 
atoms. 


On the basis of the Atoms in Molecules (AIM) theory”, the bond top- 
ological properties can be derived from the charge-density analysis”. 
The Laplacian and gradient maps display bond paths as well as 
boundaries of the respective atoms. The crossing point of the bond 
path and the boundary between two atoms is defined as the bond 
critical point (BCP). Distortions of the bond paths were clearly 
observed in the Fe-S bonds of Fe4S4(Cys-S.)4 (Fig. 3a, b). The BCPs 
exist between all Fe-S bonds (Extended Data Table 5). The electron 
density at the BCP (pgcp) values for most Fe-S bonds have a nega- 
tive correlation with bond lengths, as expected”*”°. However, those 
for FE1-S2 and FE2-S1 do not follow this rule (Fig. 3c). These two 
bonds have low pgcp values despite short bond lengths. This fact is 
consistent with weak 3d-3p overlaps as shown in the deformation 
map (Fig. 2). The bond strengths are unequal among the four Fe-S 
bonds of each Fe atom. The Fe-(Cys-S.) bonds give the highest pgcp 
values (Extended Data Table 5). In addition, the BCPs are located at 
closer positions to Fe atoms than the other three Fe-S bonds. This 
feature of Fe-(Cys-S.) may give some contribution to the distortion 
of the Fe4S4 cluster from the Tz symmetry, in addition to the spin 
coupling scheme” and the difference in bond length’. Cys46, which 
is the only cysteine residue showing the planer peptide bond, gives the 
lowest pcp among the four Fe-(Cys-S.) bonds (Extended Data Table 
5). This may be further evidence for the weaker interaction between 
FE2 and Cys46 as stated above. The Laplacian of electron densities 
(V7p) at all BCPs in Fe4S4(Cys-S.)4 is positive, as in the case of other 
transition metal complexes”. 

The contributions of valence electrons in the interactions between 
Fe,S4(Cys-S.)4 and the protein environment are clearly visualized in 
the charge-density analysis (Fig. 2d). Valence electrons of Cys-S. atoms 
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Bond length (A) 
Figure 3 | Topological analysis of charge density in the Fe4S, cluster. 
a, The Laplacian Vp map in the plane of FE1-S3-FE4-S2. Contour 
interval is 0.05 electrons per angstrom to the fifth power. Red solid lines 
represent negative values and blue dashed lines represent positive values. 
BCPs are shown as crosses. b, The gradient vector fields in the same plane 
as in a. c, Relationship between bond length and pgcp. Filled circles in blue, 
pink and green are for intra-subcluster Fe-S, inter-subcluster Fe-S and 
Fe-(Cys-S.), respectively. 


of Cys43, Cys46, Cys61 and Cys75 interact with the carbonyl oxygen 
atom of Asn70 and H atom of Ile69-C;s; (Extended Data Fig. 3a), the 
main-chain amides of Phe48 and Thr79 (Fig. 4a), the main-chain amide 
of Leu63 and the H atom of Phe64-C;, (Extended Data Fig. 3b), and 
the main-chain amide of Ser77 (Extended Data Fig. 3c), respectively. 
Valence electrons of $1, $2 and S3 interact with the H atom of Phe48- 
Cg, (Extended Data Fig. 3d), the H atoms of Phe64-C-, and Ile69-C,, 
(Extended Data Fig. 3e), and the H atom of Trp78-Cs, (Fig. 4b). No 
interactions are observed between S2 and the H atom of Tyr19-C; 
(Extended Data Fig. 3e), and S3 and the main-chain amide of Cys75 
(Fig. 4b), although these are in close proximity. The $4 atom does not 
interact with the H atoms of Cys43-C, and Cys46-C, (Extended Data 
Fig. 3f). In addition, the $4 atom is not surrounded by aromatic side 
chains. The interaction between valence electrons and H atoms has 
correlation with atomic charges of S atoms. S4 with the largest atomic 
charge (—1.6) has no such interactions, while other bridging S atoms 
with smaller charges (—0.5 to —0.4) have (Fig. 2d). Furthermore, this 
rule is consistent for Cys-S atoms. Cys46-S., and Cys61-S. with smaller 
charges (—0.1 and —0.3) have two interactions, while Cys43-S, and 
Cys75-S., with larger charges (—1.5 and —1.3) have only one inter- 
action. Charge transfer from S to H atoms through the interactions 
may be involved in the reduction of the negative charges of S. Taken 
together with atomic charges and topological properties, detailed views 
of interactions between the cluster and surrounding ligands imply that 
the FE] atom together with FE2, $4 and Cys43-S, atoms are crucial for 
storing electronic charges in the reduced HiPIP. The atomic charges in 
this paper were derived from the AIM analysis. It has been shown that 
the AIM analysis provides the most reliable estimation of the electronic 
properties for model clusters”®. In our study, the absolute values of 
the AIM charges are generally larger than charges derived from the 
multipolar parameters (Extended Data Table 4). This can be attributed 
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Figure 4 | Interaction network around the Fe,S, cluster. a, Deformation 
electron density around the Cys46-S. atom. The main-chain amides 

of Phe48 and Thr79 are located close to the Cys46-S, atom. The static 
deformation maps are shown as grey and cyan surfaces contoured at the 
levels of +0.1 and +0.3 electrons per cubic angstrém, respectively, and 

the omit map of hydrogen atoms is shown as a pink mesh contoured at the 
3.00 level. The dashed lines indicate interactions between valence densities 
of sulfur atoms and hydrogen atoms. b, Deformation electron density 
around $3 of the Fe,S, cluster. The main-chain amide of Cys75 and the 

H atom of Trp78-Cs, are located close to the $3 atom. 


to charge transfer from S to Fe atoms. However, it is difficult to discuss 
this only from our results. 

The cuboid [Fe4S4]°* cluster is divided into two rhombic [Fe2S2]* 
subclusters””. Each subcluster has ferromagnetically coupled two Fe 
atoms with d? configuration (S = 9/2). The two subclusters are anti- 
ferromagnetically coupled to each other to give S=0. In HiPIP from 
T. tepidum, one subcluster consists of FE1, FE2, $3 and $4 atoms and 
coordinates Cys43 and Cys46, while another subcluster consists of FE3, 
FE4, S1 and S2 atoms and coordinates Cys61 and Cys75 by taking the 
results of HiPIP from Chromatium vinosum?””® into consideration. The 
assignment is consistent with the overlap manner in the Fe-S bonds 
and the topological analysis in which FE1-S2 and FE2-S1 between two 
subclusters show a distinctive feature. It was suggested that the redox 
reaction is localized in one of the two subclusters”’. This is likely to be 
consistent with our implication for electron-storing atoms. 

The charge-density analysis of this study experimentally reveals a 
subatomic structure of respective atoms in the polynuclear metal cluster 
in the HiPIP. This result, in combination with spectroscopic and com- 
putational data, will contribute to the understanding of the relationship 
between structure and function of metalloproteins. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 
X-ray diffraction experiment. Crystals of DT T-reduced HiPIP were prepared as 
previously described’. The X-ray wavelengths were set to 0.45 A (27.6 keV) at the 
BL41XU beamline of SPring-8 (proposal numbers 2008B-1337, 2010A1237 and 
2010B1284 to K.T.). The diffraction intensities were measured using a Rayonix 
MX-225 CCD detector. The crystals were cooled during the data collection at 
100 K using a nitrogen-gas stream. Three data sets for reflections in ultra-high, 
medium- and low-resolution regions were separately collected for the different 
positions of a single crystal (0.8 mm x 0.2mm x 0.1 mm), and the data sets for 
reflections in medium and low-resolution regions were collected with attenuated 
X-ray exposure. The data set for ultra-high resolution was collected using a helical 
data collection procedure”? with a microbeam of 50jum x 50j1m. The maximum 
dose in each irradiated position was limited to ~6 x 10° Gy. The dose was estimated 
using the RADDOSE program™. The diffraction data sets were integrated and 
scaled using the HKL2000 program package*!. The relative B factors, Rsym and 
other crystallographic statistics, which represent the metrics for radiation damage, 
did not show significant changes during the data collection (Extended Data Fig. 1). 
Three data sets for ultra-high-, medium-, and low-resolution regions were merged 
into a complete data set. 
Structure refinement with the ISAM. The test set for free R factor calculation was 
set to be the same as in the 0.7 A-resolution data of HiPIP’ in the resolution range 
20-0.7 A and extended to 0.48 A resolution by randomly selecting 5% of reflec- 
tions. Structure refinement was started from the low-dose structure of the reduced 
form of HiPIP (Protein Data Bank accession number 3A39) using the SHELXL 
program*”, Positional and anisotropic displacement parameters were refined for 
all non-hydrogen atoms. Restraints for bond lengths and angles were removed for 
non-hydrogen atoms of single conformational residues and for the Fe4S, cluster. 
All of the hydrogen atoms of amino-acid residues were included in the model, and 
only the hydrogen atoms of water molecules which were observed in the Fy — F- 
map contoured at the 2.0c level were included. Bond lengths and angles involving 
hydrogen atoms were constrained using the riding model in the SHELXL program. 
The final structure in the SHELXL refinement contains 34 multiple-conformation 
residues and was refined to the Ryork and Rfree factors of 8.24% and 8.63%. 
Charge-density analysis with the MAM. The charge-density analysis with the 
MAM was performed using the MoPro program™. The refinement for HiPIP fol- 
lowed the procedure for human aldose reductase™4, but was slightly modified. The 
scale factors were refined in the whole resolution range 20-0.48 A. Bulk solvent 
parameters with the scale factors were refined in the resolution range 20-1.0 A. 
The subsequent refinement processes were performed for the non-hydrogen atoms 
of single conformational residues, the atoms in the Fe4S, cluster and the oxygen 
atoms of water molecules with two hydrogen atoms (H20O). Positional and aniso- 
tropic displacement parameters were refined in the whole resolution range. Then, 
positional and anisotropic displacement parameters were refined using only 
high-resolution reflections (high-order refinement). In the high-order refinement, 
resolution ranges used for refinement were sequentially changed as 1.0-0.48 A, 
0.9-0.48 A, 0.8-0.48 A, 0.7-0.48 A and 0.65-0.48 A. Positions of hydrogen atoms 
were changed to the standard geometry determined by neutron diffraction exper- 
iments of small molecules* and were fixed in the subsequent refinement steps. 
After the high-order refinement, strong peaks remained near the atom positions 
of the Fe,S, cluster in the residual electron-density map. The short path lengths of 
low-resolution reflections cause an incomplete absorption of the high-energy 
X-rays in the CCD detector**. To correct the resolution-dependence in the absorp- 
tion of the CCD detector, we used an exponential function by which the observed 
structure factors were fitted to the calculated ones as Fops (Akl) = Fops(hkl) /EF 
where EF = >, ajexp{bi(sin@/\ — ci)} + d, and the coefficients of aj, bj, ¢; (i= 1, 
2, 3) and d were fitted to <|F,|/|F-|> in 50 resolution bins by a least-squares 
method. 

The MAM is expressed as shown in equation (1): 


Imax I 
Pxom(X) = Prore\T) a Pra Pyai( KT) oe » 1=0 KOR AAT D m=0 PimsVima( 9s £) 
(1) 


The atom» Pcore ANd Pyal represent the total, spherical core and spherical valence 
electron densities, and Py.) and P),,4. are the spherical valence and deformation 
multipole populations. R,, is the Slater-type radial functions and yj,4 the real 
spherical harmonics. The « and «’ parameters are electron-density expansion/ 
contraction coefficients. The parameters used for the least-squares refinement 
are Pyal, Pim, & and «’. Multipolar parameters were transferred to all of the 
amino-acid residues and H2O molecules from the experimental library multipolar 
atom model (ELMAM)*”. The multipolar parameters were assigned to the octupole 
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level (Imax = 3) for C, N and O atoms, the hexadecapole level (Jinax =4) for S and Fe 
atoms and the dipole level (/max= 1) for H atoms. Solvent atoms other than H,O 
were treated as spherical with a neural charge. The definition of local axes was 
derived from the multipolar library for the atoms of the amino-acid residues and 
H,0. The local axes for the sulfur atoms of Fe4S4(Cys-S.)4 were defined to coincide 
with the directions of the unit cell axes. The d-orbital populations for the local axes 
(x, y, z) were calculated and are listed in Extended Data Table 6. The local axes were 
set as shown in Extended Data Fig. 4, according to refs 38, 39. 

Initial values of multipolar parameters for Fe4S4(Cys-S.)4 were unavailable in 
ELMAM. However, the atomic charge of the Fe atom is difficult to define prop- 
erly in the multipolar refinement because of the diffused distribution of the 4s 
electrons*®. In the multipolar refinement of HiPIP, the initial values of Pai and « 
parameters were obtained from a grid search procedure for Fe, S and Cys-S. atoms. 
The «’ parameter was fixed to 1.00. All of the Pj. parameters (/> 0) were refined 
for the atoms of FeyS4(Cys-S.)4, but the monopole Poo parameters were fixed to zero 
because the Poo parameter has a high correlation with the Py. parameter. Finally, 
the initial values of Py, and & parameters were determined to be 7.5 and 1.10 for 
Fe atoms, 6.5 and 0.99 for S atoms and 6.5 and 0.99 for Cys-S., atoms. 

After the grid search, the P},,4 parameters were refined for single conforma- 
tional residues, Fe4S4(Cys-S.)4 and HO molecules. Following the refinement of 
the Pj, parameters, positional and anisotropic displacement parameters were 
refined for the non-hydrogen atoms. In the refinement of the P,a) parameters it was 
difficult to obtain reasonable values for the atomic charges in FeyS4(Cys-S.,)4. Thus, 
the P,| parameters were refined after the P),,, parameters. The differences in the 
atomic scattering factors for Fe?* and Fe** were prominent in the resolution range 
lower than 1.5 A. Reasonable atomic charges were obtained in the refinement of 
P,ai parameters with the resolution range 20-1.2 A. The Pyai parameters for single 
conformational residues were also refined, but the Py parameters for hydrogen 
atoms were constrained to be the same as for the chemically equivalent atoms. The 
total number of electrons in the crystal was fixed during the refinement of Pyai 
parameters. Finally, the Rwor and Ryree factors were converged to 7.16% and 7.80% 
(Extended Data Table 1). Beg values for Fe4S4(Cys-S.)4 in the final model are listed 
in Extended Data Table 4. The static deformation map in this paper was calculated 
by the equation A pyatic(T) = Ppa [Panatti(® ~~ rj) = Pram(t > rj)]- The parameters 
Pmulti and pyam represent the electron density calculated from the MAM and from 
the ISAM, respectively. The static deformation map is calculated excluding 
the contribution of the atomic displacement parameters of atoms. The two- 
dimensional contour maps were prepared using the VMoPro program and the 
three-dimensional figures were prepared using PYMOL””. Topological analyses 
based on the AIM theory”** was performed with VMoPro. The atomic charges 
by the AIM theory were calculated with Bader*!. The atomic charges by the equa- 
tion of q= Nyai — Pyals where Nya is the number of valence electrons in the neutral 
charge, were also calculated. The two methods gave highly correlated results as 
listed in Extended Data Table 4. 
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Extended Data Figure 1 | Quality of the diffraction data at 0.48 A of Ryym at the highest-resolution shell (0.50-0.48 A) and relative B factor in 
resolution. a, The diffraction image. Right, zoom view of the boxed the course of the data collection. The Ryym (blue) and relative B factor (red) 


region at left. The resolution for each circle is indicated. b, Rym (blue) and are plotted as functions of frame number. 
<I>/<o(D)> (pink) values are plotted for 30 resolution bins. c, Changes 
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Extended Data Figure 2 | Residual electron density for each refinement for the ISAM analysis, and 0.23 and —0.23 electrons per cubic angstr6m 
step. The left panels show the residual density after the ISAM refinement; for the MAM analysis. c, The Fe4S, cluster. The plane consists of FE1, $3 


the right panels show the residual density after the MAM refinement. and Cys43-S, atoms. Maximum and minimum peaks are 0.60 and —0.35 
a, The plane of the peptide bond between Asn45 and Cys46. Maximum electrons per cubic angstrém for the ISAM analysis, and 0.35 and —0.29 
and minimum peaks are 0.33 and —0.22 electrons per cubic angstrom for electrons per cubic angstrom for the MAM analysis. The contour interval 
the ISAM analysis, and 0.18 and —0.20 electrons per cubic angstrom for is 0.05 electrons per cubic angstrom for all figures. Blue solid, red dashed 
the MAM analysis. b, The plane of the aromatic ring of Trp74. Maximum and yellow dashed lines denote positive, negative and zero contours, 

and minimum peaks are 0.34 and —0.29 electrons per cubic angstrom respectively. 
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Extended Data Figure 3 | Interaction network around the Fe,S, cluster. 
a, Deformation electron density around the Cys43-S, atom. The main- 
chain oxygen atom of Asn70, the main-chain carboxyl carbon atom of 
Gly73 and the H atom of Ile69-C;, are located close to Cys43-S.. The static 
deformation maps are shown as grey and cyan surfaces contoured at the 
levels of +0.1 and +0.3 electrons per cubic angstrom, respectively. The 
omit map of hydrogen atoms is shown as a pink mesh contoured at the 
3.00 level. The dashed lines indicate interactions between valence densities 
of sulfur atoms and hydrogen atoms. b, Deformation electron density 
around Cys61-S.. The main-chain amide of Leu63 and the H atom of 


Phe64-C;, are located close to Cys61-S,. c, Deformation electron density 
around Cys75-S.. The main-chain amide of Ser77 is located close to 
Cys75-S.. d, Deformation electron density around S1 of the Fe4S, cluster. 
The H atom of Phe48-Cs2 and the Cs; atom of Leu63 are located close to S1. 
e, Deformation electron density around S2 of the Fe4S, cluster. The 

H atoms of Tyr19-Cs;, Phe64-C-2 and Ile69-C., are located close to S2. 

f, Deformation electron density around S4 of the Fe,S, cluster. The H atom 
of Cys43-Cg, the H atom of Cys46-Cg and the amide nitrogen atom of 
Met49 are located close to $4. 
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local axes of each Fe atom (FE1—FE4). The static deformation maps of Fe4S4(Cys-S.)4 are represented as grey isosurfaces contoured at the level of 
+0.2 electrons per cubic angstrom. 
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Extended Data Table 1 | Data collection and refinement statistics 


Data collection 
Space group 
Cell dimensions 

a, b,c (A) 

a, B,y (°) 
Resolution (A) 
Ryym or Ruerge (%) 
I/ol 
Completeness (%) 
Redundancy 


Refinement 
Resolution (A) 
No. reflections 
Ryork/ Rizee (%) (ISAM) 
Ryork! Rice (%) (MAM) 
No. non-H atoms 
Protein 
Ligand/ion 
Water 
No. H atoms 
Protein 
Ligand/ion 
Water 


*Highest-resolution shell is shown in parentheses. 


HiPIP 
P2)2)2) 


46.48, 58.91, 23.44 
90, 90, 90 

20-0.48 (0.50-0.48)* 
5.6 (33.9) 

61.1 (2.7) 

96.3 (89.0) 

5.4 (3.0) 


20-0.48 
301,119 
8.24/8.63 
7.16/7.80 
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Extended Data Table 2 | Dihedral and improper angles 


Tyr19—Asn20 
Asp22-—Ala23 
Ala23—-Thr24 
Ala31—Ala32 
Gln41—His42 
His42—Cys43 
Cys43—Ala44 
Ala44—Asn45 
Gly60-Cys61 
Leu68—Ile69 

Gly73-Thr74 
Thr74—Cys75 
Cys75—Ala76 
Ala76—Ser77 


Dihedral angle (°) 
162.0 
-177.0 
177.0 
176.4 
168.3 
-170.9 
-175.2 
-179.5 
166.7 
170.0 
179.6 
163.7 
-172.8 
170.7 


Improper angle* (°) 
1.78 
0.98 


«The improper angle is an angle between the C-N’-C,,’ and H’/-N’-C,,’ planes. 
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Extended Data Table 3 | Geometrical parameters in FegS4(Cys-S..)4 


Distance (A) Angle (°) 
FE1-S2 2.2256 (3)* FE1-S2-FE4 73.12 (1)* 
FE1-S4 2.3096 (4) FE1-S2-FE3 73.53 (-) 
FE1-S3 2.3212 (4) FE1-S4-FE3 73.13 (1) 
FE2-S1 93177 GB) FE1-S4-FE2 72.81 (1) 
FE2-S3 2.3064 (3) FE1-S3-FE4 71.94 (1) 
FE2-S4 2.3185 (4) FE1-S3-FE2 72.81 (1) 
FE3-S4 2.2489 (3) FE2-S1-FE4 73.68 (-) 
FE3-S2 2.3110 (4) FE2-S1-FE3 73.14 (2) 
FE3-S1 2.3150 (4) FE2-S3-FE4 72.54 (2) 
FE4-S3 2.2705 (3) FE2-S4-FE3 72.51 (1) 
FE4-S1 2.2972 (4) FE3-S2-FE4 72.31 (1) 
FE4-S2 2.3015 (4) FE3-S1-FE4 72.31 (1) 
FE1-FE2 2.7466 (3) S1-FE2-S3 104.73 (2) 
FE1-FE3 2.7161 (2) S1-FE2-S4 105.32 (1) 
FE1-FE4 2.6974 (3) S1-FE4-S3 103.34 (1) 
FE2-FE3 2.7018 (3) S1-FE4-S2 106.39 (1) 
FE2-FE4 2.7077 (3) S1-FE3-S4 104.42 (1) 
FE3-FE4 2.7212 (3) S1-FE3-S2 105.49 (1) 
FE1-(Cys43-S,) 2.2563 (3) S2-FE1-S4 104.35 (2) 
FE2-(Cys46-S,) 2.2824 (3) S2-FE1-S3 105.51 (1) 
FE3-(Cys61-S,) 2.2626 (4) S2-FE4-S3 104.70 (1) 
FE4-(Cys75-S,) 2.2709 (3) S2-FE3-S4 103.56 (1) 
(Tyr19-Cg))-S2 3.673 (2) S3-FE2-S4 105.06 (1) 
(Phe48-N)-(Cys46-S,) 3.444 (1) S3-FE1-S4 104.87 (1) 
(Phe48-Cg)-S1 3.792 (2) (Cys43-S,)-FE1-S2 113.49 (1) 
(Leu63-N)-(Cys61-S,) 3.375 (1) (Cys43-S,)-FE1-S4 111.60 (2) 
(Leu63-C5))-S1 3.525 (2) (Cys43-S,)-FE1-S3 115.99 (2) 
(Phe64-Cs2)-(Cys61-S,) 3.716 (2) (Cys46-S,)-FE2-S1 114.63 (1) 
(Phe64-C,»)-S2 3.964 (2) (Cys46-S,)-FE2-S3 115.38 (1) 
(Ile69-C,)-S2 3.893 (2) (Cys46-S,)-FE2-S4 110.79 (1) 
(Asn70-O)-(Cys43-S,) 3.357 (1) (Cys61-S,)-FE3-S4 117.52 (2) 
(Cys75-N)-S3 3.401 (1) (Cys61-S,)-FE3-S2 119.16 (2) 
(Ser77-N)-(Cys75-S,) 3.365 (1) (Cys61-S,)-FE3-S1 105.22 (2) 
(Trp78-Cg1)-S3 3.771 (1) (Cys75-S,)-FE4-S3 126.49 (2) 
(Thr79-N)-(Cys46-S,) 3.526 (1) (Cys75-S,)-FE4-S1 104.97 (2) 
(Cys75-S,)-FE4-S2__ 109.41 (1) 


*Values in parentheses are estimated standard deviations given by full-matrix least-squares refinement. 
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Extended Data Table 4 | Atomic properties of FegS4(Cys-S.,)4 


Atom Charge (AIM) Charge (N—Pya1) Beg factor (A’) 
FEl +0.92 +0.42 (12)* 2.32 
FE2 +1.15 +0.58 (12) 2.31 
FE3 +1.13 +0.97 (12) 2.40 
FE4 +1.46 +1.33 (12) 2.33 
Sl -0.39 -0.14 (27) 2.93 
S2 -0.51 -0.36 (27) 2.47 
S3 -0.52 -0.21 (28) 2.40 
S4 -1.57 -1.40 (28) 2.46 
Cys43-S, -1.48 -1.38 (28) 2.51 
Cys46-S, -0.12 -0.16 (28) 2.36 
Cys61-S, -0.27 -0.20 (28) 2.82 
Cys75-S, -1.27 -1.11 (29) 2.44 
Total -1.47 -1.69 


«Values in parentheses are estimated standard deviations given by full-matrix least-squares refinement. 
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Extended Data Table 5 | Topological parameters at BCPs of Fe-S bonds 


FE1-S2 
FE1-S3 


FE1-(Cys43-S,) 
FE2-(Cys46-S,) 
FE3-(Cys61-S,) 
FE4-(Cys75-S,) 


d(A) d, (A) p(elA*) Vp (e/A°) 
2.2256 1.058 0.54 1.4 
2.3212 1.024 0.61 3.7 
2.3096 1.093 0.44 3.4 
2.2177 0.966 0.61 5.2) 
2.3064 1.056 0.51 25 
2.3185 1.131 0.50 2.2 
2.3150 1.050 0.65 2.3 
2.3110 1.049 0.59 1.7 
2.2489 0.973 0.73 5.5 
2.2972 1.056 0.54 2.0 
2.3015 1.024 0.70 3.8 
2.2705 1.058 0.54 1.0 
2.2563 1.001 0.72 2.3 
2.2824 1.034 0.61 4.6 
2.2626 0.987 0.84 0.0 
2.2709 0.944 0.73 5.0 


Fa Sa eae 2 


*Parameter dj is the distance between the Fe atom of the pair and the BCP. 
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Extended Data Table 6 | The d-orbital populations of iron atoms 


oo  — 
Zz xy XZ YZ xy total 


FEI 2.00 1.99 0.98 0.65 1.97 7.59 
(26.4%) (26.3%) (12.9%) (8.6%) (25.9%) 

FE2 1.76 1.93 1.98 0.83 0.91 7.43 
(23.7%) (26.0%) (26.7%) (11.2%) (12.3%) 

FE3 1.99 2.00 1.01 1.55 0.49 7.04 
(28.3%) (28.4%) (14.4%) (22.0%) (7.0%) 

FE4 1.99 1.05 0.64 0.99 1.99 6.68 
(29.9%) (15.8%) (9.7%) (14.9%) (29.9%) 
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Scientific illustrator Victor Leshyk used a sketch from researchers (left) to create a conception of the Gilboa Fossil Forest for a cover of Nature (right). 


SCIENCE ILLUSTRATION 


Picture perfect 


Enlisting the help of an illustrator can add impact to research papers and outreach projects. 


BY JYOTI MADHUSOODANAN 


ncanvas, a 390-million-year-old forest 

springs to life. Massive tree trunks jut 

into a sunlit clearing from a crowded 

forest floor. Stubby green branches battle with 

frilly leaf-like filaments to touch the pink- 

tinged sky. Palaeobotanist Chris Berry had 

worked for years with samples from the Gilboa 

Fossil Forest in New York, but had never before 

seen what the living forest might have looked 
like so many millennia ago. 

Dubbed ‘Lost Worlds; the digital oil painting 

was created by Victor Leshyk to accompany a 


2012 research paper in Nature by Berry and his 
colleagues (W. E. Stein et al. Nature 483, 78-81; 
2012). It was commissioned to appear on the 
cover of the journal and Berry features it in his 
talks today, especially those for lay audiences. 
It was Berry’s first experience in teaming up 
with a scientific illustrator, and Leshyk’s work 
exceeded his expectations. “It was very pres- 
tigious for us to have it on the cover, and the 
image proved very good for engagement and 
outreach,” he says. Berry, who is based at the 
University of Cardiff, UK, has collaborated with 
artists twice since then, for press releases and 
museum exhibitions that involve his research, 
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and he is discussing a second project with 
Leshyk. “If you've got a story you want to get 
out there and you've got a really good image,” 
he says, “it will fly a lot farther than just words.” 
The use of striking images to accompany 
manuscripts and outreach efforts is growing as 
more journal publishers are requiring graphical 
abstracts — depictions ofa paper’s main thrust 
or concept — to accompany studies. These 
commissioned illustrations differ from the 
everyday photograph, sketch or overview figure 
that usually accompanies research manuscripts 
or talks. They get to the core of concepts; they 
may also depict unobservable phenomena, | 
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VICTOR LESHYK 


> ranging from subatomic particles to what 
extinct life forms might have looked like. 
Although working on such images with an 
illustrator might seem like a lot of extra toil, 
and paying for their services extravagant, the 
benefits of skilled artistic presentation can be 
manifold. 

Visually stunning representations that 
result from collaborations between scientists 
and artists can grab millions of online views, 
and attract a much wider audience than a non- 
illustrated paper, both of which are particularly 
useful for researchers whose grant applications 
or funding proposals require them to show a 
public-outreach component. They are also 
more likely to be written about and shared 
digitally, helping to raise the visibility of a sci- 
entist’s work, attract more students to a lab, 
boost career standing and improve chances of 
garnering funding. They can even inspire new 
experiments — or reveal gaps in knowledge. 

Even when photographs or images already 
exist, hand-rendered or digital illustrations 
and 3D animations can clarify and enhance the 
technical details of a key data point or finding 
— exactly how proteins latch onto the surface 
of DNA, for example, or the shape of butter- 
fly larvae that are usually hidden in leaf litter. 
Scientists who want to examine their research 
question or findings more fully, to ‘see’ their 
data or to provide a pictorial boost to their 
manuscript should consider teaming up with 
an illustrator. Scientific artists can also help to 
create artwork for a project's website, or explain 
hard-to-grasp concepts with short videos. 


LEARNING POINT 
Most such collaborations begin when research- 
ers are writing a paper, but it can be helpful to 
start even earlier (see “Turn science into art’). 
Discussing with an artist how best to depict a 
mechanism or process — what to include and 
exclude, how molecules, stars or fossils should 
be positioned relative to one another — can 
help researchers to hone their hypothesis, 
reveal points of disagreement between authors 
and even identify holes in understanding. 

Chemist Lauren Benz of the University of 
San Diego, California, found that talking with 
an illustrator helped her to uncover impor- 
tant issues that she had not considered when 
she started drafting her review article about 
the applications of membranes made from 
polymers and other materials. She had com- 
missioned freelance artist Mary O’Reilly, who 
earned her PhD in biological chemistry from 
the Massachusetts Institute of Technology in 
Cambridge, to help illustrate how these mem- 
branes work at the molecular level. O’Reilly 
asked whether she should depict molecules 
filtering through a particular spot in the mem- 
brane, and Benz and her collaborators realized 
that they didn’t know exactly where the filtering 
happened. 

“Tt made me question some assumptions I 
had about the filtration mechanism, and going 
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back and forth with Mary helped us come up 
with some research questions we could ask 
going forward,’ she says. She is now planning 
experiments to tackle them. 

Scientific illustration can encapsulate infor- 
mation that is not easily or often conveyed by 
text, line drawings or simple graphics. But it 
can also be used when 


direct imagery such “Sometimes, you 
as photographs are need animage 
impractical oreven totellthe story 


impossible. Biologist effectively.” 
Jessica Linton, who 

works with the Canadian consulting firm Natu- 
ral Resource Solutions in Waterloo, was work- 
ing on a recovery strategy for the endangered 
mottled duskywing butterfly (Erynnis martialis) 
when she realized that there were no available 
images of the creature’s microscopic eggs and 
pupae, which tend to be buried in soil under leaf 
litter, and are extremely difficult to photograph. 

Armed with scientific descriptions, she 
turned to illustrator Emily Damstra, whom 
she had met through a local butterfly enthu- 
siasts group. Damstra’s illustrations — which 
are now included in the Ontario government's 
policy document outlining the recovery strat- 
egy — received enthusiastic appreciation from 
butterfly researchers and ecologists. 

For those who work at the molecular level, 
illustrations and videos often provide the first 
visualization of materials or concepts that 
the researchers might have worked on for 
years — and it can bea revelation. As a gradu- 
ate student, Janet Iwasa often found herself and 
her lab mates resorting to stick-figure drawings 
or waving their hands around to depict the 
movements of the protein they were studying: 
kinesin, which scuttles along skeletal filaments 
inside cells. “Scientific information was often 
lost,” she says. “The first time I really under- 
stood how kinesin worked was when my prin- 
cipal investigator hired an animator to illustrate 
it? (In part because of her frustration over this, 
she left bench research after completing a post- 
doc and now works on molecular visuali- 
zation in her post at the University of 
Utah in Salt Lake City.) 

These depictions can offer sur- 
prising perspectives. “Some- 
times, you need an image to tell 
the story effectively,’ says visual 
science communicator Kate 
Patterson of the Garvan Insti- 
tute of Medical Research in 
Sydney, Australia. “They can 
also be question-generating, 
as scientists start to think 
about what they’re seeing in 
a new way.” When Patterson 
showed some researchers her 
animation of how DNA can be 


Molecular visualizations of structures 
such as HIV can point researchers to 
new avenues of investigation. 
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modified at the chemical level, a lively discus- 
sion ensued about how the process. Thanks to 
the animation, the group began to consider 
the physical arrangements of molecules inside 
the nucleus, rather than just the chemistry or 
enzymes involved. 

Working with illustrators can also help 
scientists to hone their own skills at presenting 
data in images. Cell biologist Matt Thomson 
at the University of California, San Francisco, 
says that collaborating with science illustrator 
Jessica Huppi for his study on embryonic cells 
taught him to prune less-relevant details for 
better impact, and that colour and layout can 
often convey information more effectively than 
text labels. 

The paper showed that genes in growing 
embryonic cells can be controlled by light 
(C. Sokolik et al. Cell Syst. 1, 117-129; 2015), 
and Huppi’s illustrations helped him to real- 
ize that there were many ways of conveying 
information visually. Seeing how Huppi used 
effects such as colours, shapes and relative sizes 
has helped him to represent data effectively in 
subsequent work, he says. “Working with an 
illustrator gives you a chance to learn how to 
approach this type of process of thinking visu- 
ally — how you convey time in a drawing, or 
how you can convey cause and effect.” 


CONCEPTUAL APPROACH 

Many researchers who have worked with 
illustrators say that they expect to do so again. 
But they note that the time needed to produce 
good artwork can add weeks to preparing 
a paper, and the expense of hiring a profes- 
sional ranges from a few hundred dollars to 
thousands. This kind of time and money is not 
always defensible. Benz says that illustrations 
are useful for portraying general ideas or con- 
cepts, but that simple data can often be con- 
veyed clearly in charts and graphs. Thomson 
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GET STARTED 


Turn science into art 


Here are some tips for getting the most 
out of the experience of creating art for 
science. 

@ Establish a working relationship with an 
illustrator long before you will need her 
or him — when you start writing a review 
article, for example, or when pursuing 
outreach projects for schools or museums. 
@ Seek out illustrators who have expertise 
in areas related to your research and look 
through their portfolios for artistic styles 
that you like. Scientists typically find 
artists through referrals from colleagues 
or through online searches for illustrators 
in their geographic area or field of study. 
The Guild of Natural Science Illustrators 


cautions against enlisting professional help 
just to make a paper more decorative. Scien- 
tists who want to save money and create their 
own art and figures can use Microsoft Excel, 
molecular-visualization software and tools 
such as Adobe Photoshop and Illustrator, but 
those without artistic training may find they 
need to invest time in learning how to use the 
programs. 

But they should consider more than just the 
money when making that choice. Hiring an 
illustrator saved Benz’s two graduate-student 
co-authors from a huge time sink. “For them 
to not have to spend hours on learning how 
to draw a figure was hugely helpful,’ she says. 
“There was a direct impact on our work.” 

And although the right software can help 
a researcher to produce simple figures and 
visualize single molecules, that will not always 
result in a professional-style animation or 
illustration. Researchers who are not artists 
tend to lack the sense of design and aesthet- 
ics that are a keystones of fabulous artwork. 
“Where illustrators come in is in their knowl- 
edge of colour theory, using composition to 
guide someone's eye around a page or image 
in the right order,” says O’Reilly. “Or drawing 
their eye to the centre of interest.” 

When Berry published a paper about a dif- 
ferent fossil forest, his institution's press office 
asked him for images. With no illustrator 
accessible at the time, he sketched out trees by 
hand and sent his line drawings to a colleague 
who helped to add colour. The image is now 
widely used on websites, news stories and in 
research presentations, Berry says. Although 
his drawing was much simpler than Leshyk’s, 
the process still took him nearly two weeks. 
“Tt was a lot of fun,” he says. “But I’m not sure 
I could do it again. That was the first time 
I tried to draw a whole forest to a standard 
good enough to let other people look at” The 
experience underscored to him how much 


in Washington DC maintains a list of 
contacts, and many illustrators share their 
own work on Twitter under #sciart. 

@ Clearly establish the data points that 
need to be in the art from the outset, so 
that the end product is accurate. But allow 
the artist to maximize the visual impact of 
their illustration. 

@ Be bold with ideas. One image isn’t the 
definitive description of a scientific theory, 
so it’s fine if an image includes some 
ambiguity about unknowns or hypotheses 
as long as it’s done with sufficient context. 
@ Seek illustrators who ask questions. You 
should aim to find an artist who engages 
with your work. J.M. 


effort — and talent — is required for illustra- 
tion. Since then, he has chosen to seek profes- 
sional help when he needs artwork. 

Yet the value of professional scientific 
illustration has been tough to quantify or 
explain to many. Few, if any, studies have 
examined its impact on a manuscript, 
presentation or grant proposal. But many 
researchers vow that illustrated manuscripts 
get better results. “Anecdotally, people say 
you get more citations, or reviewers are hap- 
pier with a paper, if you have good figures,” 
Patterson says. “Or if you have a cover image, 
itll get more attention. But the actual data 
behind that are lacking” 

Still, researchers agree that whether 
through a simple graphic or a 3D animation, 
the visual communication of science is grow- 
ing increasingly important. Some research- 
ers think that professionally made figures can 
ease a manuscript’s path through peer review. 
Although this is tough to verify, geneti- 
cist Deborah Kurrasch of the University of 
Calgary in Canada says that she has opted 
to work with illustrators many times before 
submitting a paper. And when she’s acting as 
a reviewer, she adds, well-made figures make 
it easier to read and understand the data. 

“Making data into art takes skill” Berry 
says. “If I had the resources, I would always 
hire an illustrator: = 


Jyoti Madhusoodanan is a freelance writer 
in San Jose, California. 


CORRECTION 

The Careers Feature ‘Doctor's advice’ 
(Nature 533, 429-430; 2016) incorrectly 
described Jelena Kovaéevié as a 
biomedical engineer. She is, in fact, an 
electrical engineer. 
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TRADE TALK 
College ‘mayor’ 


Elise Covic helps 
to design academic 
programmes at 
the College of 

the University of 
Chicago in Illinois. 
She explains what 
she does and how 
she gained the 
experience that 
would launch her 
career there while she was doing her PhD in 
computational neuroscience. 


What do you do as deputy dean 

at the college? 

When I was being recruited for this position, 
I was told I would be the mayor of a small, 
crazy town, and that I wouldn't know what 
problems would hit me when I woke up. 
I make decisions about awards for students 
and faculty, curricula development, develop- 
ment money, disciplinary actions. We have 
initiatives that help faculty members to help 
students to get the most substantive experi- 
ence. I feel so lucky. 


When did you first consider this 

kind of career? 

By the second year of my PhD programme, I 
had begun to think that I didn’t want a career 
that was research-based, but I didn’t want to 
tell my principal investigator (PI). I was at a 
poster session, and my PI was proud of me, 
telling me I should talk to so-and-so about 
postdocs. Right then, I said, “We need to talk. 
I don’t want my own lab.” He said, “I don't 
know if I can mentor you, but let me intro- 
duce you to some friends” He directed me to 
the US National Science Foundation’s deputy 
director — she invited me to call and e-mail, 
to come up with a plan. 


What happened next? 

I had an honest conversation with myself: 
what do I like to do? I love science. I love 
organizing. And I like to boss people around, 
so it was clear I could do administration. My 
PI said, “Why don’t you run this undergradu- 
ate research programme with me?” He taught 
me how to administer grants and lab budgets, 
to deal with government agencies, to handle 
regulatory-compliance issues with the uni- 
versity. I had in-depth training. Other people 
could gain similar experience, if they ask the 
right question. m 


INTERVIEW BY MONYA BAKER 


This interview has been edited for length and 
clarity. See go.nature.com/1svOHIM for more. 
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Ua SCIENCE FICTION 


THE MEMORY WARD 


BY WENDY NIKEL 


iza was losing her memories. She 
Le at her morning omelette and 

wondered what shed lost overnight — 
which familiar faces had turned into stran- 
gers, which moments were gone, lost like 
bubbles popped in mid-air, leaving no trace 
of their former existence. 

“Coffee?” the nurse asked. Liza couldn't 
recall her name. Had she known it 
yesterday? 

“Old Liza never drinks the stuff” 
Another woman with the same 
straight, white hair and same squar- 
ishness to her jaw shuffled into the 
next seat. Now here’s someone Liza 
could never forget, leastways not 
while she still remembered anything 
at all. Every step of the way, from the 
cradle Gramps carved, to their wild 
days on the wrong side of the law, 
to the Memory Ward in this sec- 
ond-rate nursing home, Cousin 
Jessa had been there. “She'll have 
more orange juice though, if you’ve 
got some.” 

“Morning.” Liza sighed as the nurse 
poured her more orange juice, orange juice 
she didn’t even really want. 

“You've got that look on your face.” Jessa 
frowned. “You took out more memories last 
night?” 

“Yeah, I did” Liza shrugged sheepishly. 
“I know it’s been years, but ... I don’t want 
to forget him.” 

Liza had been losing bits and pieces of 
her life for more than a decade now to that 
wretched disease. At least this way, when 
she extracted them into Memory Cubes, she 
could revisit them whenever she wanted to, 
even if they were gone from her mind. 

Jessa scoffed. “Only a fool like you would 
try to save a memory by wiping it from your 
mind” 

“Dont tell me you've never put anything 
into the Cubes.” 

“Course I have. Don’t ask what.” She 
shooka heap of sugar into her coffee. “That's 
the difference between us. You put stuff in 
there to remember it; I put stuff in there to 
forget it” 


Liza must have already tucked away the 
memory of the first time she met him, 
because she couldn't recall a thing about 
it. She remembered later that year, though, 
when she told Jessa that shed promised to go 
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Forgive and forget? 


straight, find an honest job, and take him up 
on his proposal. Jessa hadn't taken it too well. 

“We're peas in a pod,” Jessa said. “Part- 
ners in crime. And he’s nothing — nothing 
— like us. You marry him and next thing you 
know, he'll be wanting kids and a house with 
a white picket fence and roast beef on the 
table every night. Is that really the kind of 
life you want?” 


That was the one time in her life she 
stood up to Jessa; she couldn't help whom 
she loved. 

Just thinking about him made Liza want 
to see him one last time, the way he was that 
night before their wedding, before every- 
thing went wrong... before a sharp turn on 
a slick road stole away the love of her life. 

Liza searched for a nurse on her way to the 
Cube cabinet. Somehow, although they usu- 
ally hassled her all day, now that she needed 
one, they were nowhere to be found. She 
reached for the call button in her cardigan 
pocket, but the only thing there was the but- 
ter knife from the breakfast table. She really 
was losing her mind. 

Never mind that, she thought, looking 
from the knife to the cabinet. They hadn't 
called her Lock-Pick Liza for nothing. 

Moments later, the cabinet swung open. 
She scanned the Cubes, shooting a guilty 
glance over her shoulder. She couldn't 
remember the numerical code used to iden- 
tify her cubes, but fortunately, they were also 

engraved with the 


D> NATURE.COM memory’s date, and 
Follow Futures: shed never forget that. 
 @NatureFutures She found the sil- 


E} go.naturecom/mtoodm =very cube with the 
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correct date and cradled it in her hands. 
Then she flicked the switch. 


Suddenly she was back there again. 

She sat beside him in his Jeep, but even 
in the dim light of passing headlights, she 
could tell that his face was pale and damp 
with tears. His hands clenched the wheel. 
The wipers worked furiously. 

Even though itd been rubbed out from 
Liza’s mind, there was no mistaking the 
wrongness of this memory. She distinctly 
remembered a scene from the funeral a few 
days later, when shed said over and over, “If 
only I'd gone with him...” So what was she 
doing here now? And if it wasn’t her mem- 
ory, whose was it? 

“TI give you enough to make a start for 
yourself elsewhere. Pay off your mama's 
debts. Just call off the wedding” The voice 
that came from within her, although femi- 
nine and familiar, was not Liza’. 
“Tm not interested.” 

“You think she really knows 
what she’s getting into? That 
she'll really be content with what 

you have to offer her? What a joke. 
She'll be crawling back to me, bored out of 
her mind, before a year’s passed” 

“Tl take that chance” 

“So you wont take my offer?” 

“Never. 

A thin arm shot out from where Liza 
watched and grabbed the wheel. His cry 
filled the car as it swerved over the white 
line. The woman-who-was-not-Liza flung 
herself out onto the quickly passing ground. 

The car crumpled into a retaining wall. 


Even before the memory had fully faded, 
the Cube clattered to the floor. Liza hunched 
over, clutching her chest. 

All those years... All those lies ... How 
could she? How dare she? 

“Liza?” It was the same voice as in the 
memory. 

Liza reached for the butter knife. Shed 
killed before; did she still have it in her? But 
this wasn’t just some rival crony; this was 
Jessa. How could she live with herself? 

The glimmer of a Cube caught her eye. She 
gripped the knife, the answer now crystal 
clear. Jessa wasn't the only who could forget. m 
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